Describe the problem you would like to solve
Statistics should gracefully handle malicious vulnerability scans (not by us - suppose by bad actor).
I was surprised that one of our journals had not processed the usagestats for a week and once manually issued the commands showed a 100 fold improved stats for an issue and inspected the log files. This is the third time occurring for this journal, but I didn’t investigate the previous times.
The file was full of entries like: (replaced the sitename with xxx.xxx.org)
{"time":"2024-06-01 01:27:45","ip":"a5c102bcaff71204a97cc30ba833c75521a42f2c4c74c0e7ce23728e858db5bf","userAgent":"AJIJMmsU' OR 585=(SELECT 585 FROM PG_SLEEP(15))--","canonicalUrl":"https:\/\/xxx.xxx.org/xxx\/index","assocType":256,"contextId":16,"submissionId":null,"representationId":null,"submissionFileId":null,"fileType":null,"country":"NL","region":null,"city":"Amsterdam","institutionIds":[],"version":"3.4.0.5","issueId":null,"issueGalleyId":null}
{"time":"2024-06-01 01:27:47","ip":"a5c102bcaff71204a97cc30ba833c75521a42f2c4c74c0e7ce23728e858db5bf","userAgent":"N71HNRvK'; waitfor delay '0:0:15' --","canonicalUrl":"https:\/\/xxx.xxx.org/xxx\/index","assocType":256,"contextId":16,"submissionId":null,"representationId":null,"submissionFileId":null,"fileType":null,"country":"NL","region":null,"city":"Amsterdam","institutionIds":[],"version":"3.4.0.5","issueId":null,"issueGalleyId":null}
{"time":"2024-06-01 01:27:10","ip":"a5c102bcaff71204a97cc30ba833c75521a42f2c4c74c0e7ce23728e858db5bf","userAgent":"if(now()=sysdate(),sleep(15),0)","canonicalUrl":"https:\/\/xxx.xxx.org/xxx\/index","assocType":256,"contextId":16,"submissionId":null,"representationId":null,"submissionFileId":null,"fileType":null,"country":"NL","region":null,"city":"Amsterdam","institutionIds":[],"version":"3.4.0.5","issueId":null,"issueGalleyId":null}
{"time":"2024-06-01 01:27:10","ip":"a5c102bcaff71204a97cc30ba833c75521a42f2c4c74c0e7ce23728e858db5bf","userAgent":"-1\" OR 2+328-328-1=0+0+0+1 --","canonicalUrl":"https:\/\/xxx.xxx.org/xxx\/index","assocType":256,"contextId":16,"submissionId":null,"representationId":null,"submissionFileId":null,"fileType":null,"country":"NL","region":null,"city":"Amsterdam","institutionIds":[],"version":"3.4.0.5","issueId":null,"issueGalleyId":null}
(Note the userAgent field)
Describe the solution you’d like
It would be nice to somewhat automatically clean up the statistics.
I suppose it could work to add some blacklist that could exclude an IP based on some heuristic (just riffing - say unreasonable amount of user agents, originates > 50% of traffic, too many requests per minute and the such)
Who is asking for this feature?
Technical support (users complain about weird statistics)