Daily usage events log not loading into db

jhennig · November 10, 2025, 9:31pm

Hi,
We have OJS 3.5.0.1 running as a test instance and I have noticed that the usageStats/usageEventLogs files keep piling up. My understanding is that they should be processed into the database and go into the usageStats/archive folder

I have the task runner and job runner turned on:
job_runner = On
task_runner = On

I have tried manually running php lib/pkp/tools/scheduler.php run

but I just get:
2025-11-10 11:35:45 Running [PKP\task\ProcessQueueJobs] … 10.30ms DONE
Also when I run:
php lib/pkp/tools/jobs.php list
I get:
[WARNING] We have 0 queued jobs

I have successfully reprocessed older files by moving them to the stage folder and running
php lib/pkp/tools/reprocessUsageStatsMonth.php YYYYMM
php lib/pkp/tools/jobs.php work

So I know that part works.

When I run:
php lib/pkp/tools/scheduler.php list

I get a list of various tasks which include:
0 0 * * * APP\tasks\UsageStatsLoader … Next Due: 2 hours from now

I’m thinking that there should be a job the goes into the queue triggered by a user http request and when that queue is processed the usageEventLogs would be processed. So I think the problem is getting a the job to process the usageEventLogs into the queue, as the “jobs.php work” command seems to work fine.

Or am I completely misunderstanding the process?

Thanks
Jeremy

dwm · November 11, 2025, 2:54am

I’ll let the experts weigh on, but I wanted to raise that I also noticed a statistical issue after moving to 3.5.

I’ve been working the last few weeks on understanding what happened. I’m worried something is wrong with the institutional stats compiler and I’ve been trying to verify things against access logs and database backups both pre and post my update to 3.5. I can’t say whether it was a problem with the mysql dump and restore or attempting to reprocess, which occured during overlapping work periods in late August-present for me.

I was pulling counter records last month when it became very clear that the records were no longer accurate. I’ve reverted and made other actions to get things fixed, especially since the reprocesser doesn’t back up the tables before it attempts to reprocess. This is key because if you have moved items from the archive file to restage them and they fail, your stats are still gone. [To me this feels unwise given that we don’t have amazing tools to judge the outcome IF you run a multi-installation. The in-built counter stat report tool is still broken.]

While investigating, I slightly modified things with additional logic to prevent unfound issues, articles, and submissions that may have been deleted or permanently moved from causing a sea of foreign key errors. The processor can really get jammed on an entire historical log of these as it checks and rechecks for things that your database can’t provide. For 60-80k entries per day, let’s say three times each, all month long.

So if you get one stuck in your queue, jobs can backup and fail. If you don’t spot it quickly and your backups aren’t solid, then you can easily lose data, I think. This is especially true the larger your installation. Not sure if ours qualifies, but we run a multi installation with dozens of journals each with plenty of institutional subscribers. I suspect as the access logs cycled out and additional new statiscs were waiting to compile and couldn’t be added to their final tables, the temp tables grew so large that it couldn’t overcome timing out on its queries. I don’t know if this is a batching problem or maybe concurrency, but the reprocessing script can be a real pain if the logs have too many issues or there are too many items in the temporary tables.

I want to mention a few things about how to track these issues. Baby steps for me! When you run reprocessUsageStatsMonth.php, it does a bunch of things, but one that is easy to see is this: files get picked up by that script. You can use something simple like this to refresh the directories as it goes:
watch -n 5 "ls /var/www/ojs/files/usageStats/stage/*.log 2>/dev/null | wc -l; echo '---'; ls /var/www/ojs/files/usageStats/dispatch/*.log 2>/dev/null | wc -l; echo '---'; ls /var/www/ojs/files/usageStats/archive/*.log 2>/dev/null | wc -l"

Files should move from stage → dispatch → archive. If stuck in stage, the worker isn’t running. If in dispatch, then they’re awaiting processing. This means the reprocess script pushed its job as a chain of different commands about these files that will all enter into the jobs queue when they’re being worked. They are NOT visible using jobs.php --list yet. Not until they are called by the what the reprocessing script set in motion. If the reprocess fails, you can tell at first because not all your files will make it across to archive. They’ll remain in dispatch.

It’s worth using the in-built jobs.php call to watch these, too. You can see them arrive as files start to move across. It doesn’t happen right away, but if the worker you’re using is the one running the jobs queue directly, then they’ll all be there. The initial reprocessing can take anywhere from 1-30minutes. I set some very long timeouts for problematic logs as a test using –-timeout That was part of the reason for tracking the files themselves – there’s no telling when the processing is going to finish.

It was easier to track these issues by carefully controlling a single worker and watching its output. I turned off scheduled tasks and instead turned on a supervisor to always keep a worker running. The reprocessing was creating failures and I kept wasting time not noticing. It also logs the output for me so I can check that everything went through and can match the jobs id with those in the OJS UI interface. It’s actually easier to requeue statistics jobs from there than on the CLI for me. I can share a few of my scripts if thats of interest.
Good luck!

bozana · November 11, 2025, 9:39am

Hi @jhennig

It seems you are using the built-in task and job runner, which is fine if you do not have a huge installation.

Every day UsageStatsLoader scheduled task is run. UsageStatsLoader considers log files in the folder usageEventLogs and stage. Thereby the files from usageEventLogs are first copied to the stage folder. It then dispatches lots of jobs in a chain (that should be executed one after another, in a specific order) for all the log files there.
Note: The log files and folders need to have correct access permissions. When using the build-in runner, the web user needs read and write access to the log files and folders.
Once the job chain is dispatched, the files are moved to the folder dispatch. They stay there until they are not processed by the jobs. When a file is fully processed by the jobs, and statistics calculated and saved in their DB tables, it is moved to the folder archive.

The scheduled tasks have their log files under files_folder/scheduledTaskLogs/. There you can also see the log files for UsageStatsLoader scheduled task. If an error occurs when that task is run, you will find it in that log file. So, it is good to control the scheduled task running there.

As you describe it, it seems like your log files from the usageEventLogs folder are not moved/copied anywhere, they stay there, correct? Because it seems, that the reprocessing from the stage folder works fine, it seems like there is a problem moving/copying the log files folder by the UsageStatsLoader from usageEventLogs to the stage folder.
Maybe to double check the access permissions, and to double check the last (or some of the last) UsageStatsLoader scheduled tasks log files – to see if there is any further helping information there.

Maybe also an additional info, what usage stats jobs exists and what they do:
When you use the built-in job runner, the jobs on the queue are run on a user request, when a user requests your web page/site.
These are the statistics jobs, that exist only in a chain:
First the job ProcessUsageStatsLogFile is run, that reads a log file and saves the entries in temporary DB tables.
The job RemoveDoubleClicks is then run on those temporary DB tables, to remove the double clicks – all the accesses of one and the same URL withing 30 seconds by the same user will be reduced to 1.
The job CompileUniqueInvestigations consideres unique accesses – all the accesses of one submission within an hour by the same user will be counted as 1 unique investigation.
The job CompileUniqueRequests consideres unique file downlaods – all the downloads for one submission within an hour by the same user will be counted as 1 unique request.
The job CompileContextMetrics then calculates daily statistics for the journal index page.
The job CompileIssueMetrics calculates daily statistics for issues.
The job CompileSubmissionMetrics calculates daily statistics for submissions.
The job CompileSubmissionGeoDailyMetrics calculates daily Geo statistics for submissions.
The job CompileCounterSubmissionDailyMetrics calculates the COUNTER R5 daily metrics for submissions.
The job CompileCounterSubmissionInstitutionDailyMetrics calculates the COUNTER R5 daily metrics for submissions and institutions.
The job DeleteUsageStatsTemporaryRecords deletes the entries in the temporary tables – because all the statistics for that day should be calculated at that moment.
The job ArchiveUsageStatsLogFile moves the processed log file into the archive folder.
The job CompileMonthlyMetrics calculates the monthly Geo statistics, COUNTER R5 submissions and COUNTER R5 submissions+institutional statistics.

As admin you will see the jobs under “Administration”. Or in the DB table jobs. If a job fails, you will see it also under “Administration” of in the DB table failed_jobs. There, you will also find the reason why the job failed.

I hope that helps a little bit and you can investigate further where the problem lies.

Best,
Bozana

bozana · November 11, 2025, 10:33am

Hi @dwm

Because you have a huge installation, maybe your log files are also big, and the job ProcessUsageStatsLogFile cannot be finished because of the timeout. The timeout is defined in that class (= 600 seconds), but it would maybe need to be increased for some installations, that have huge log files. We already noticed this problem, but still have not provide a general solution. However, you should be able to see which job and why failed, either under Administration or in the DB table failed_jobs.

Other thing is that statistics jobs HAVE TO be executed one after another, and not in parallel. This is ensured using a chain. Else the stats can be wrong, or temporary tables grow too much. Also, if several log files reprocessing (via UsageStatsLoader) are started while the previous one is not finished yet, the temporary tables grow too big and the processing is almost impossible. Temporary tables should only contain entries from one log file. Else, it is too heavy processing, that could lead to failures. Thus, if a file processing fails, it is sometimes only possible to remove the entries from the temporary tables and restart the log files processing again (with UsageStatsLoader) – not to restart/retry the jobs.

I hope that all information is in the log files, so that reprocessing the log files successfully, will always repair the wrong statistics.
But yes, I think the log file processing and the statistics jobs running should be monitored regularly.

Thanks a lot for the information you provided, it is maybe useful for other users.

Thanks a lot!
Bozana

dwm · November 11, 2025, 3:48pm

Thanks for the feedback. I was nodding along as I read your two replies because it matches the learning I did this month. I’m proud of that, because this area of OJS is both complex and important to institutional subscribers of our journals. I really appreciate you taking the time to provide really explicit technical details about the process.

I wonder if a few more of these details (specifically for administators) might be worth updating in the docs. I also want to cast a vote for the log reprocesser to be expanded to a) run a separate queue from daily tasks so it can be tracked more carefully and clearly with its own log maybe and b) use a temporary table to store a backup of the stats tables in case the reprocessing fails. There’s no guardrails. So if you recover/want to add one day, you have to do the whole month, which entails deleting 29 or 30 others? I understand the rollup processes, but its a huge data risk and I (to my own fault) got burned, I think. Cautionary but also uncessary for it to be so risky to begin with?

Additionally, I think in the setup guide, new users might helpfull be guided even by a single sentence that access logs should be set by default to retain at least one month to match the statistics process. That isn’t set all by default in apache. It’s forever by default, I believe. But if users set their own too low, that can make reprocessing from the logs themselves more challenging or impossible (e.g., you have to pull access logs from far more backups to get days outside your rotatation coverage.)

So now I feel like I have a firm control of it and, I hope, releatedly counter SUSHI API protocals. SIDEBAR especially for @bozana:

I’ve been building a statistical reporting tool. It’s nearly done and we’re beginning to test it. I’ll be using it to manage further stats checking and creating custom reports. As an example, we have a subscriber having trouble with IP access and checking their institutionally affiliated IP addresses with the system is, to me and my team, a diagnostic helper for our journal managers to see a broad picture of our installation’s activity. I’ll DM you access to it and a test user account, since it’s a private server for access to our installation specifically and requires an API key.

In the end my hand was forced a bit because the OJS 3.4 and 3.5 counter report generator remains unuseable and broken and has no functionality to cover multiple subscriptions owned by the same institution. This is an issue if you have to request reports from nearly 4 dozen different journals. And librarians do NOT want to be given json instead of csv, so the API itself needs some kind of custom post-processing… This is circling back to SUSHI access in v51 where you first helped me begin to get access to the backend of the stats reporting.

bozana · November 12, 2025, 9:14am

Hi @dwm

Yes, I will see what can I do to improve the admin documentation. I will also see if we can use a separate queue for statistics jobs and think about how to make stats reprocessing easier. We however have now other higher priorities so it will probably not come soon, but…
Thanks!

I am wondering why do you need Apache log files? OJS contains usage log files and they are hopefully archived and there, so reprocessing them would be enough…

Regarding an institution being involved in multiple subscriptions: What exactly is the problem here?
How do you mean that counter report generator is broken in 3.4 and 3.5?

Thanks!
Bozana

jhennig · November 13, 2025, 7:20pm

Thanks.@bozana
I changed the permissions to 775 and adjusted the directory ownership to apache and unfortunately the files have not moved to the stage folder.
I did see an error:
TypeError: PKP\template\PKPTemplateManager::addHeader(): Argument #2 ($header) must be of type string, null given, called in /var/www/sites/ojs3-sandbox/plugins/generic/customHeader/CustomHeaderPlugin.php on line 129 and defined in /var/www/sites/ojs3-sandbox/lib/pkp/classes/template/PKPTemplateManager.php:617
Stack trace:
#0 /var/www/sites/ojs3-sandbox/plugins/generic/customHeader/CustomHeaderPlugin.php(129): PKP\template\PKPTemplateManager->addHeader()

#1 [internal function]: APP\plugins\generic\customHeader\CustomHeaderPlugin->displayTemplateHook()
…

Maybe that error is preventing the task scheduler from being executed. I’ll see if I can get around it.

Thanks.
Jeremy

bozana · November 14, 2025, 2:01pm

Hi @jhennig

Do you see anything in the current UsageStatsLoader scheduled task log file in files_folder/scheduledTaskLogs/ ?
With php lib/pkp/tools/scheduler.php list, run from the OJS folder, you can see when is the next execution of the UsageStatsLoader scheduled task.

Best,
Bozana

jhennig · November 14, 2025, 5:05pm

According to the lib/pkp/tools/scheduler.php list command APP\tasks\UsageStatsLoader runs at 5PM daily. However the scheduledTaskLogs/UsageStateLoad-xxxx.log files haven’t been generated since I manually re-loaded old logs last week.

The ProcessQueueJobs-xxxxxx.log files generate quite frequently because the PKP\task\ProcessQueueJobs in the scheduler runs every minute.
There are also some other log files that are generated regularly. So the scheduler seems to be working fine. The UsageStatsLoader log is the only one that isn’t generated unless I manually load the old log files.

Where could I put some logging in the code to try to track down what’s happening?
Also is there a way to adjust the scheduler? I don’t see anything in the database, is it hard coded somewhere?

Thanks
Jeremy

jhennig · November 14, 2025, 7:17pm

Hi @bozana

quick update:
The only way I was able to process the logs was to run

lib/pkp/tools/scheduler.php test

and choose the UsageStatsLoader from the menu

This then loaded the queue which I was able to then run the jobs.php to process them.

bozana · November 19, 2025, 8:31am

Hi @jhennig
hmmm… I do not know why UsageStatsLoader scheduled task wouldn’t run every day, when every other runs normally…

What exactly would you like to adjust in the code – then maybe I can help where to find…

jhennig · November 19, 2025, 5:35pm

I would just like to know where I can add error_log() at the point where the UsageStatsLoader should be called from the Scheduler.

dwm · November 20, 2025, 5:22pm

My understanding of it is like this:

Tasks get scheduled from /var/www/ojs/classes/scheduler/Scheduler.php

Then using Cron or the scheduled_tasks = On, this gets executed by:
/var/www/ojs/lib/pkp/classes/scheduledTask/ScheduleTaskRunner.php

With statistics we’re talking about this file
/var/www/ojs/jobs/statistics/ProcessUsageStatsLogFile.php
operated by the worker/supervisor: php lib/pkp/tools/jobs.php work.

I think at one point I put an error_log directly into the Scheduler.php like
->call(fn () => { error_log("[USAGE_STATS_DEBUG] Scheduler triggering UsageStatsLoader at " . date('Y-m-d H:i:s')); return (new UsageStatsLoader([]))->execute(); })

But the execution of the tasks is in the PKPUsageStatsLoader.php file ~ line 92 in executeActions(). There you could do:
protected function executeActions(): bool { error_log("[USAGE_STATS_DEBUG] UsageStatsLoader execution started"); // ... existing code }

The supervisor itself already keeps jobs.log so you can see output and processing errors in by tailing /var/log/supervisor/jobs.log

But these errors, since we’re using the error_log, will go to /var/log/apache2/error.log instead. So you can monitor with sudo tail -f /var/log/apache2/error.log | grep “USAGE_STATS_DEBUG”

Can I ask whether you’re collecting stats daily or monthly? My knoweldge about how things work is geared toward monthly, since that was our default. We’re going to change to daily in 2026.

dwm · November 20, 2025, 5:43pm

In 3.4 and still now in 3.5 the report generation feature available in any journal at endpoints: https://domain/journal/stats/counterR5/counterR5 or https://domain/journal/stats/reports/report?pluginName=CounterReportPlugin

Neither of these work. Not in 3.4 (where I reported it previously) and still not in 3.5 after a patch was supposedly implemented.

In the 4.1 side with journal and article report links by year. The 4.1 reports are only helpful to journal admins and NOT institutional subscriptions. They do not include {OMITTED} as the API requires for individual subscription data. So they’re not COUNTER compliant for those users at all since they lack all institutional references and information.

If you head to Counter R5 instead, journal admins with permission clicking the buttons listed with “edit” on the right side (very confusing name, btw) will find they can’t download anything for anyone.

Yes the drawer opens and the start/end/{OMITTED}, metric type, attributes, and exclusion options are there… but no matter the inputs selected you are ALWAYS met with

Error

An unexpected error has occurred. Please reload the page and try again.

This error produces NO error in the apache2 error.log.

In browser console the API being rendered (exactly) as:
https://journal.equinoxpub.com/BAR//api/v1/stats/sushi/reports/pr_p1?begin_date=2023-10-01&end_date=2025-10-31&{OMITTED}

Obviously that is malformed. It has //, is missing our base url which should be [DOMAIN]/index.php/[CODE]/api….

Fixing those two issues, the extra slash and the index.php correctly produce results with a direct call like: curl -H “Authorization: Bearer $(grep api_key_secret /var/www/ojs/config.inc.php | cut -d’”’ -f2)" “``https://journal.equinoxpub.com/index.php/BAR/api/v1/stats/sushi/reports/pr_p1?begin_date=2023-10-01&end_date=2025-10-31&{OMITTED}”

I know I had quite a few issues with // when we upgraded to 3.5.1 this August, but I thought I had resolved them all. This one remains apparently, but i had given up on the stats reporting previously in 3.4 because (back then) I was far less comfortable with the application and coding elements.

I thought I had reported it with many of the other issues I flagged in late August/September, but I may have missed it.

PS. I can’t even talk about this issue properly because label in the Sushi API for the element we call institutional_id is actually called {OMITTED} which is a word I can’t say that rhymes with “russ-tom-er” So {OMITTED} which is an essential component of the API’s url is really “rustomer_id=instituional_id” and I can’t write it clearly. Get rid of that nonsense so we can talk sensibly about that field.

jhennig · November 20, 2025, 7:51pm

Thanks. We are collecting stats daily.

jhennig · November 21, 2025, 6:38pm

In classes/scheduler/Scheduler.php I modified the

$this
->schedule
->call(fn () => (new UsageStatsLoader([]))->execute())
->daily()
->name(UsageStatsLoader::class)
->withoutOverlapping();

to run everyMinute() instead of daily and an opcache permission error popped up. I’m not sure why I didn’t see this before but I deleted the cache files with the elevated permissions and the task ran without error.

I changed it back to daily and I’ll see what happens next.

Thanks for your help.

Jeremy