Statistics should gracefully handle malicious vulnerability scans

Describe the problem you would like to solve
Statistics should gracefully handle malicious vulnerability scans (not by us - suppose by bad actor).

I was surprised that one of our journals had not processed the usagestats for a week and once manually issued the commands showed a 100 fold improved stats for an issue and inspected the log files. This is the third time occurring for this journal, but I didn’t investigate the previous times.

The file was full of entries like: (replaced the sitename with xxx.xxx.org)

{"time":"2024-06-01 01:27:45","ip":"a5c102bcaff71204a97cc30ba833c75521a42f2c4c74c0e7ce23728e858db5bf","userAgent":"AJIJMmsU' OR 585=(SELECT 585 FROM PG_SLEEP(15))--","canonicalUrl":"https:\/\/xxx.xxx.org/xxx\/index","assocType":256,"contextId":16,"submissionId":null,"representationId":null,"submissionFileId":null,"fileType":null,"country":"NL","region":null,"city":"Amsterdam","institutionIds":[],"version":"3.4.0.5","issueId":null,"issueGalleyId":null}
{"time":"2024-06-01 01:27:47","ip":"a5c102bcaff71204a97cc30ba833c75521a42f2c4c74c0e7ce23728e858db5bf","userAgent":"N71HNRvK'; waitfor delay '0:0:15' --","canonicalUrl":"https:\/\/xxx.xxx.org/xxx\/index","assocType":256,"contextId":16,"submissionId":null,"representationId":null,"submissionFileId":null,"fileType":null,"country":"NL","region":null,"city":"Amsterdam","institutionIds":[],"version":"3.4.0.5","issueId":null,"issueGalleyId":null}

{"time":"2024-06-01 01:27:10","ip":"a5c102bcaff71204a97cc30ba833c75521a42f2c4c74c0e7ce23728e858db5bf","userAgent":"if(now()=sysdate(),sleep(15),0)","canonicalUrl":"https:\/\/xxx.xxx.org/xxx\/index","assocType":256,"contextId":16,"submissionId":null,"representationId":null,"submissionFileId":null,"fileType":null,"country":"NL","region":null,"city":"Amsterdam","institutionIds":[],"version":"3.4.0.5","issueId":null,"issueGalleyId":null}
{"time":"2024-06-01 01:27:10","ip":"a5c102bcaff71204a97cc30ba833c75521a42f2c4c74c0e7ce23728e858db5bf","userAgent":"-1\" OR 2+328-328-1=0+0+0+1 --","canonicalUrl":"https:\/\/xxx.xxx.org/xxx\/index","assocType":256,"contextId":16,"submissionId":null,"representationId":null,"submissionFileId":null,"fileType":null,"country":"NL","region":null,"city":"Amsterdam","institutionIds":[],"version":"3.4.0.5","issueId":null,"issueGalleyId":null}

(Note the userAgent field)

Describe the solution you’d like
It would be nice to somewhat automatically clean up the statistics.

I suppose it could work to add some blacklist that could exclude an IP based on some heuristic (just riffing - say unreasonable amount of user agents, originates > 50% of traffic, too many requests per minute and the such)

Who is asking for this feature?
Technical support (users complain about weird statistics)

Not saying it’s a good practice but taking in consideration that statistics are becoming a kinda measure of the performance of a journal… this FR makes a lot of sense.

I will assign my vote as soon as I remove former ones.

Cheers,
m.