Spikes in traffic / CPU (solved via robots.txt)

stuart.yeates · January 19, 2023, 7:51pm

In the last couple of days we experienced a couple spikes of traffic and CPU, which we tracked down to an apparent change in behaviour of web crawlers.

Since the spikes happened several times we took mitigation action, changing robots.txt from:

User-agent: *
Disallow: /cache/

to

User-agent: *
Disallow: /cache/
Crawl-delay: 10

~24 hours after making the change we’re still seeing web crawlers, but at much lower numbers and having no issues with traffic / load spikes.

cheers
stuart

asmecher · January 19, 2023, 8:13pm

Hi @stuart.yeates,

Thanks for posting! Are you seeing a lot of demand due to web crawler requests for the author search pages? If so, see Serious performance issue with regards to caching - OJS 3 - #9 by asmecher for some relevant suggestions. (These pages are deprecated in recent releases and will be removed starting with 3.4.0.)

Regards,
Alec Smecher
Public Knowledge Project Team

stuart.yeates · January 19, 2023, 8:31pm

In this case the spikes in traffic appeared to be unrelated to author search. The spike in load is primarily on the application server and not the mysql server.

asmecher · January 19, 2023, 9:08pm

Hi @stuart.yeates,

Hmm, OJS is typically harder on the database server than the application server. If you’re able to track down patterns in what is driving the CPU load, I’d have a closer look. Meanwhile, your suggestion of slowing down crawlers is a good one.

Thanks,
Alec Smecher
Public Knowledge Project Team

abadan · January 19, 2023, 10:33pm

What I’ve seen that could explain:

A request to /search/authors hangs in MySQL. Several others are in queue waiting, but they don’t make MySQL consume relevant additional resources.

On the application side, each access to /search/authors is waiting and, as a result, we have more and more PHP (or apache) processes open until some limit is reached on the application side.

asmecher · January 30, 2023, 4:00pm

This topic was automatically closed after 10 days. New replies are no longer allowed.