I’ll let the experts weigh on, but I wanted to raise that I also noticed a statistical issue after moving to 3.5.
I’ve been working the last few weeks on understanding what happened. I’m worried something is wrong with the institutional stats compiler and I’ve been trying to verify things against access logs and database backups both pre and post my update to 3.5. I can’t say whether it was a problem with the mysql dump and restore or attempting to reprocess, which occured during overlapping work periods in late August-present for me.
I was pulling counter records last month when it became very clear that the records were no longer accurate. I’ve reverted and made other actions to get things fixed, especially since the reprocesser doesn’t back up the tables before it attempts to reprocess. This is key because if you have moved items from the archive file to restage them and they fail, your stats are still gone. [To me this feels unwise given that we don’t have amazing tools to judge the outcome IF you run a multi-installation. The in-built counter stat report tool is still broken.]
While investigating, I slightly modified things with additional logic to prevent unfound issues, articles, and submissions that may have been deleted or permanently moved from causing a sea of foreign key errors. The processor can really get jammed on an entire historical log of these as it checks and rechecks for things that your database can’t provide. For 60-80k entries per day, let’s say three times each, all month long.
So if you get one stuck in your queue, jobs can backup and fail. If you don’t spot it quickly and your backups aren’t solid, then you can easily lose data, I think. This is especially true the larger your installation. Not sure if ours qualifies, but we run a multi installation with dozens of journals each with plenty of institutional subscribers. I suspect as the access logs cycled out and additional new statiscs were waiting to compile and couldn’t be added to their final tables, the temp tables grew so large that it couldn’t overcome timing out on its queries. I don’t know if this is a batching problem or maybe concurrency, but the reprocessing script can be a real pain if the logs have too many issues or there are too many items in the temporary tables.
I want to mention a few things about how to track these issues. Baby steps for me! When you run reprocessUsageStatsMonth.php, it does a bunch of things, but one that is easy to see is this: files get picked up by that script. You can use something simple like this to refresh the directories as it goes:
watch -n 5 "ls /var/www/ojs/files/usageStats/stage/*.log 2>/dev/null | wc -l; echo '---'; ls /var/www/ojs/files/usageStats/dispatch/*.log 2>/dev/null | wc -l; echo '---'; ls /var/www/ojs/files/usageStats/archive/*.log 2>/dev/null | wc -l"
Files should move from stage → dispatch → archive. If stuck in stage, the worker isn’t running. If in dispatch, then they’re awaiting processing. This means the reprocess script pushed its job as a chain of different commands about these files that will all enter into the jobs queue when they’re being worked. They are NOT visible using jobs.php --list yet. Not until they are called by the what the reprocessing script set in motion. If the reprocess fails, you can tell at first because not all your files will make it across to archive. They’ll remain in dispatch.
It’s worth using the in-built jobs.php call to watch these, too. You can see them arrive as files start to move across. It doesn’t happen right away, but if the worker you’re using is the one running the jobs queue directly, then they’ll all be there. The initial reprocessing can take anywhere from 1-30minutes. I set some very long timeouts for problematic logs as a test using –-timeout That was part of the reason for tracking the files themselves – there’s no telling when the processing is going to finish.
It was easier to track these issues by carefully controlling a single worker and watching its output. I turned off scheduled tasks and instead turned on a supervisor to always keep a worker running. The reprocessing was creating failures and I kept wasting time not noticing. It also logs the output for me so I can check that everything went through and can match the jobs id with those in the OJS UI interface. It’s actually easier to requeue statistics jobs from there than on the CLI for me. I can share a few of my scripts if thats of interest.
Good luck!