Serious performance issue with regards to caching - OJS 3

Hi,

There have been several instances on my server when the load goes over >100. After researching into this and carefully observing why this is happening from MySQL Processes, this is what I’ve discovered;

Anytime the Data Cache & Template Caches are cleared, upon the first visit to the journal page - it queries a bunch of MySQL commands that usually takes up a decent amount of time to complete (considering there are over >6m rows in my OJS instance).

Here, the problem isn’t that these caching queries take place - but when multiple users are trying to visit this website during the same time or when this process is taking place, it creates multiple instances of this cache refreshing queries which causes a spam on the database. This also becomes a loop that keeps going on for hours, from what I’ve experience on my journals and ultimately disables all my websites.

Can someone please share a permanent fix/patch on this, how do I disable this loop of caching or if there is any other solution to this?

Looking forward to your replies.

Best Regards,
Jaimin

1 Like

Hi @Jaimin,

Could you please indicate the specific version of OJS you’re using (e.g. 3.3.0-13)?

-Roger
PKP Team

Hi @rcgillis,

That’s correct, I’m using the latest version 3.3.0-13.

Regards.

One more observation that I recently made,

Every-time I try to load the journal’s sitemap, it sends these cache queries all over again.

2 Likes

Hi @Jaimin,

In any system using caching, when you clear the caches, it’ll need to re-generate them. During that time period, the system will necessarily perform worse than if it had the caches already present.

A couple of thoughts…

  1. Are you regularly clearing the data caches? This shouldn’t be necessary.
  2. Have you investigated the database queries that are causing the system to experience performance problems? It’s possible that you have an underlying database performance issue, such as tables using mixed storage engines (InnoDB vs. MyISAM).
  3. If performance problems persist, and you do need to clear the data cache for some reason, you might want to re-generate the caches while the system is temporarily offline to outside users. To do this, for example, you could temporarily introduce a .htaccess rule that blocks access to anybody but your IP address; then click around the site (or use a series of wget commands) to cause the most important cache files to be re-generated. Then open up access to all other users. This will give OJS a chance to generate the cache files before a lot of users rush in simultaneously.

Regards,
Alec Smecher
Public Knowledge Project Team

Hi @asmecher,

I have no idea at all why this is happening, we do not clear data caches at all from our end (only I have admin access).

  1. Never
  2. All the 12 journals that I manage only use InnoDB engine (all tables), I had manually run a command to convert them before upgrading to 3.3.0-13.
  3. I have been having to do this for a long time now, and I’ve had sleepless nights because of the downtimes that are occurring after the upgrade.

Any idea why this did not happen in OJS 2.4 to us and on 3.3, any particular plugins that force cache clearing?

This cache clearing takes place in every journal at least once every 24 hours or so, is it some plugin/pages in the installation that runs this?

I’m sharing some of the queries that are run while this happens;

135510 Prepare 0 starting SELECT cve.* FROM controlled_vocab_entries cve WHERE cve.controlled_vocab_id = ? ORDER BY seq
135632 Execute 0 Sending data SELECT * FROM author_settings WHERE author_id = ?
135923 Execute 1 closing tables select locale from submissions as s where s.submission_id = ? limit 1
136844 Execute 0 Sending to client SELECT * FROM author_settings WHERE author_id = ?
137087 Execute 0 Sending to client SELECT * FROM author_settings WHERE author_id = ?
137565 Execute 12 Sending data SELECT COUNT() AS row_count FROM (SELECT a., ug.show_title, s.locale, COALESCE(agl.setting_val
137889 Execute 0 closing tables SELECT cve.* FROM controlled_vocab_entries cve WHERE cve.controlled_vocab_id = ? ORDER BY seq
138125 Execute 0 statistics SELECT * FROM author_settings WHERE author_id = ?
138129 Execute 0 statistics SELECT * FROM controlled_vocabs WHERE symbolic = ? AND assoc_type = ? AND assoc_id = ?
138139 Execute 147 Creating sort index SELECT a.*, ug.show_title, s.locale, COALESCE(agl.setting_value, agpl.setting_value) AS author_g
138312 Execute 9 Sending data select s.* from submissions as s left join publications as issue_p on issue_p.`submissio
138418 Execute 9 Sending data select *, s.locale as submission_locale from authors as a left join publications as p

Can you share some fix that I can implement to resolve this issue? Would be appreciated.

Please let me know if you need any more info from my end.

Could you share the portion of the code/file that I can comment out to disable this caching process temporarily (perhaps, to test if the problem really lies in this process).

Hi @Jaimin,

Let’s pick one entry that shows up a few times and see if it suggests a problem. I see that the following query is listed several times:

SELECT * FROM author_settings WHERE author_id = ?

This should be a very quick query, served by an index.

What do you get for the following database query?

SHOW INDEXES FROM author_settings;

For a clean OJS 3.3.0-x installation, the result should look like:

+-----------------+------------+---------------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table           | Non_unique | Key_name                  | Seq_in_index | Column_name  | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-----------------+------------+---------------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| author_settings |          0 | author_settings_pkey      |            1 | author_id    | A         |          51 |     NULL | NULL   |      | BTREE      |         |               |
| author_settings |          0 | author_settings_pkey      |            2 | locale       | A         |         154 |     NULL | NULL   |      | BTREE      |         |               |
| author_settings |          0 | author_settings_pkey      |            3 | setting_name | A         |         154 |     NULL | NULL   |      | BTREE      |         |               |
| author_settings |          1 | author_settings_author_id |            1 | author_id    | A         |          51 |     NULL | NULL   |      | BTREE      |         |               |
+-----------------+------------+---------------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

As you can see, there is an index on the author_id column that should serve the above query. I wonder if it’s missing.

Regards,
Alec Smecher
Public Knowledge Project Team

Hi @asmecher,

Sharing my output below;

author_settings 0 author_settings_pkey 1 author_id A 29874 NULL NULL BTREE
author_settings 0 author_settings_pkey 2 locale A 60621 NULL NULL BTREE
author_settings 0 author_settings_pkey 3 setting_name A 219467 NULL NULL BTREE
author_settings 1 author_settings_author_id 1 author_id A 29961 NULL NULL BTREE

(It only shows a few times because there were multiple instances of the query loop, there were a lot more queries that show up).

Waiting for your reply.

Regards.

Hi @Jaimin,

OK, that looks as it should, thanks.

Looking at your list above, I see:

138139 	Execute 	147 	Creating sort index 	SELECT a.*, ug.show_title, s.locale, COALESCE(agl.setting_value, agpl.setting_value) AS author_g

This is related to the author index, which is legacy code and is known to perform poorly. (It will be removed entirely starting with 3.4.0.) See this related issue:

If you see a lot of requests for /search/authors (e.g. from search engines) in your access log around the time that performance degrades, it’s likely related to this. There is a suggestion here on how to quickly disable it.

Regards,
Alec Smecher
Public Knowledge Project Team

Hi @asmecher,

Appreciate your prompt replies, I tried using those direct author search links and it does appear to eat up memory a lot and sends those queries that I could see above. I’ve implemented the fix you’ve shared and will wait and observe if that resolves the issue completely.

Additionally, I’d like to bring to your notice that sitemap is also something that cooks up the memory as much and I had disabled it on my instance (created my own code to produce sitemap). Something that could be worked on for OJS 3.4 - if not already.

Will keep you updated on if this was a permanent solution for my instances.

Best Regards,
Jaimin

Hi @Jaimin,

The sitemap is currently coded to generate a complete list of all submissions in a journal, and involves fetching a complete representation of each of the journal’s issues and submissions from the database. This will result in a lot of database queries, but each should be well indexed and thus quick, and as the results are iterated one at a time, it shouldn’t consume a lot of memory (beyond the DOM objects that represent the sitemap as it’s built).

The sitemap is intended for occasional fetching by legitimate search engines. While we could optimize it to reduce the number of queries involved (by coding a workaround to the object model), or introduce caching, both approaches have downsides that are keeping me from being too enthusiastic about them.

I’d suggest checking your access log to see whether it is being fetched enough to warrant changes, and if so, by whom. I suspect the author index is a much larger factor in the issues you’re encountering than the sitemap.

Regards,
Alec Smecher
Public Knowledge Project Team

Hi @asmecher,

I want to add that your previous solution worked and our journals are working amazing now. The load is back to how it used to be with 2.4.8. Thanks a lot for your help!

I understand, it was however just a suggestion. :slight_smile:

I’m no professional programmer, but this is a sitemap turn around that I made to build a manual sitemap (only takes a second to load) - specific to my journals.

// All Articles

$article_query = mysqli_query($sitemapdb, "SELECT * FROM publications WHERE status='3'");

if(mysqli_num_rows($article_query) > 0)
{
    while($row = mysqli_fetch_assoc($article_query))
    {
		$articleid=$row['submission_id'];
    ?><url>
        <loc><?php echo $baseurl;?>/index.php/<?php echo $tag;?>/article/view/<?php echo $articleid; ?></loc>
      </url>
        <?php
		
		$double_check_query = mysqli_query($sitemapdb, "SELECT assoc_id as checklink FROM submission_files WHERE submission_id = '".$articleid."' AND uploader_user_id!=1 LIMIT 1");
		
		if(mysqli_num_rows($double_check_query) > 0)
		{
			$get_dcheck = mysqli_fetch_assoc($double_check_query)
			?>
                        <url>
                          <loc><?php echo $baseurl;?>/index.php/<?php echo $tag;?>/article/view/<?php echo $articleid;?>/<?php echo($get_dcheck['checklink']); ?></loc>
                        </url> 
            <?php
		}
        else
		{
			$pdf_query = mysqli_query($sitemapdb, "SELECT MIN(assoc_id) AS thepdflink FROM submission_files WHERE submission_id = '".$articleid."' AND assoc_id!='' AND genre_id=1");
			if(mysqli_num_rows($pdf_query) > 0)
             {
                $get_article = mysqli_fetch_assoc($pdf_query)
        ?>
                        <url>
                          <loc><?php echo $baseurl;?>/index.php/<?php echo $tag;?>/article/view/<?php echo $articleid;?>/<?php echo($get_article['thepdflink']); ?></loc>
                        </url>      
        <?php
             }
		}
		?>
    <?php
    }
}

I was trying to build this section to get the latest PDF ID, but does not seem to work with the highest accuracy, any suggestion on what I could change here?

Best Regards,
Jaimin Kataria

Hi @Jaimin,

This is the downside to moving away from the standard object fetching tools that you’re finding to perform slowly – you’ll need to replicate in pure SQL some of the features and functions that are currently written in PHP. In the process you’ll lose important things like custom PDF and article IDs, etc. I’d recommend against this approach.

Regards,
Alec Smecher
Public Knowledge Project Team

Hi @asmecher,

Just an update so far, we’ve been able to run for days without an issue at all.

But, there are times where the same queries start happening out of no where - this time they do not lock out the websites (like what happens when data caches are cleared) but they create a spam of the scripts as seen before in the “/search/authors” instance.

Also, I checked out Google Search Console Server Error (5xx) Logs and found most of them to come out of the “/search/authors/” instances - but could not find any other thing that was causing a slow down.

Any idea what could be causing this? I’m not sure if there is any additional info that I could find to share with you on what it could possibly be linked to.

Best Regards,
Jaimin

Hi @Jaimin,

When you say they are the same queries, do you mean the same SQL queries as the screenshot above (e.g. on author_settings)?

If you’ve made the modification suggested by @ajnyga on googleSearch report a lot of slowpages linking to author's search · Issue #7183 · pkp/pkp-lib · GitHub, then the /search/authors URLs should respond with a 404, not a 5xx. Maybe you’re seeing old errors from before you made the change?

Regards,
Alec Smecher
Public Knowledge Project Team

Hi @asmecher,

Thanks for your reply.

Regarding the “search/authors” modification, I meant to say that it worked perfect - it does go for a Error 404. Also, the 5xx are in fact old.

These are the new queries that are showing up in a spammy manner, I was able to collect it today.

638 msusermanager localhost msuser_**** Execute 1 statistics SELECT * FROM publication_settings WHERE publication_id = ?
670 msusermanager localhost msuser_**** Execute 0 Sending to client SELECT * FROM publication_galley_settings WHERE galley_id = ?
974 msusermanager localhost msuser_**** Execute 0 Sending to client SELECT * FROM submission_settings WHERE submission_id = ?
974 msusermanager localhost msuser_**** Execute 0 Sending to client select * from publications as p where p.submission_id in (?) order by p.version asc
991 msusermanager localhost msuser_**** Execute 0 Sending to client select *, s.locale as submission_locale from authors as a left join publications as p
815 msusermanager localhost msuser_**** Execute 0 Sending to client select locale from submissions as s where s.submission_id = ? limit 1
885 msusermanager localhost msuser_**** Sleep 0 NULL
898 msusermanager localhost msuser_**** Prepare 0 Sending to client SELECT cve.* FROM controlled_vocab_entries cve WHERE cve.controlled_vocab_id = ? ORDER BY seq
885 msusermanager localhost msuser_**** Execute 0 Sending to client SELECT c.* FROM categories c INNER JOIN publication_categories pc ON (pc.category_id = c.categ
949 msusermanager localhost msuser_**** Execute 0 Sending to client SELECT * FROM controlled_vocabs WHERE symbolic = ? AND assoc_type = ? AND assoc_id = ?
974 msusermanager localhost msuser_**** Execute 0 statistics SELECT * FROM controlled_vocab_entry_settings WHERE controlled_vocab_entry_id = ?
991 msusermanager localhost msuser_**** Execute 0 Sending to client SELECT * FROM author_settings WHERE author_id = ?

Here’s the process manager log (these multiple requests each created a new instance of this query to run);

29550 (Kill) **** 0 /opt/cpanel/ea-php74/root/usr/bin/php-cgi /home//public_html//index.php
31.87 0.56
29507 (Kill) **** 0 /opt/cpanel/ea-php74/root/usr/bin/php-cgi /home//public_html//index.php
31.62 0.56
29564 (Kill) **** 0 /opt/cpanel/ea-php74/root/usr/bin/php-cgi /home//public_html//index.php
31.09 0.55
29688 (Kill) **** 0 /opt/cpanel/ea-php74/root/usr/bin/php-cgi /home//public_html//index.php
31.08 0.52
29759 (Kill) **** 0 /opt/cpanel/ea-php74/root/usr/bin/php-cgi /home//public_html//index.php
30.25 0.54
29839 (Kill) **** 0 /opt/cpanel/ea-php74/root/usr/bin/php-cgi /home//public_html//index.php
30.15 0.53
29815 (Kill) **** 0 /opt/cpanel/ea-php74/root/usr/bin/php-cgi /home//public_html//index.php
29.88 0.53
29674 (Kill) **** 0 /opt/cpanel/ea-php74/root/usr/bin/php-cgi /home//public_html//index.php
29.73 0.56
29869 (Kill) **** 0 /opt/cpanel/ea-php74/root/usr/bin/php-cgi /home//public_html//index.php
28.69 0.53
29786 (Kill) **** 0 /opt/cpanel/ea-php74/root/usr/bin/php-cgi /home//public_html//index.php
28.12 0.53

Looking forward to your reply.

Best Regards.

Just an addition to my last reply.

Here is some information I could get from MySQL Tuner.

Joins performed without indexes: 144 (in 6 hours of running MySQL)

Any idea if something could be missing from our database instances or, if anything related to our performance issues?

Best Regards.

Sorry for multiple replies, also; one more thing to note is that this does not happen because of all journals in my server (but just 2 or 3 out of 12).

Hi @Jaimin,

Does your MySQL tuner show the queries associated with the unindexed joins? I would need those in order to investigate further.

Regards,
Alec Smecher
Public Knowledge Project Team

Hi @asmecher,

Dropping the latest slow query log from today’s downtime, will help give you an insight into the same.

https://pastebin.com/Q65W07Pc

PW to the link; MnZ1E81GwH

Looking forward to your reply, hoping this is something we can solve.

Best Regards.