Hello,
I am trying to import Apache logs into our OJS installation using the php tools/runScheduledTasks.php plugins/generic/usageStats/scheduledTasksExternalLogFiles.xml
command.
=> However, when I look at stats/publishedSubmissions
page of a journal, the “pdf” and “times viewed” columns in the table below the graph are all showing 0’s (zero, nothing recorded) whereas the “abstract viewed” column does show positive integers.
Our OJS installation contains historic data (past years, since 2014) that was processed through the usage events and cron. Earlier years do contain data over all columns.
Using PHP’s var_dump
, I’ve digged around in the UsageStatsLoader.inc.php
file, feeling my way through the ingest process.
Potential issues I’ve scratched off my list:
- It’s not the regex. The data is parsed correctly.
- It’s not permissions to the directories or the files in the
usageStats
folder (stage, archive,…) - (I’m working in a local VM) I’ve tried setting the base_url to the domain of the server where the application is hosted.
- The apache logs contain parsable data. And I can see how the data gets picked up by the loader.
- I can see that the data is loaded in the
metrics
table with theojs::counter
metric_type. Theload_id
column refers to the log file I imported. Comparing with data from the usage_events logs, I don’t see anything missing (e.g. context_id, submission_id,… all contain data)
During import via runSchedulredTasks.php
, I do get this cryptic warning on the CLI, but it doesn’t break the process:
PHP Warning: assert(): assert($submissionId > 0 || (int)$uploaderUserId || (int)$fileId || (int)$assocId) failed in /vagrant/ojs.ugent.be/lib/pkp/classes/submission/SubmissionFileDAO.inc.php on line 1003
The scheduledTaskLogs
directory does contain a log file for the run which reads like this:
After a run via the CLI, the log file ends up in the Archive
directory.
I have tried moving the same file back to the Stage
directory and running php tools/runScheduledTasks.php plugins/generic/usageStats/scheduledTasks.xml
as a stopgap. But that just seems to execute the exact same logic only using the regex matching the internal “usage event” log format. (So, doing this seems to be redundant)
[2021-01-08 16:35:17] [Notice] Task process started.
[2021-01-08 16:35:19] [Warning] The line number 17607 from the file /opt/ojs-files/usageStats/processing/logfile_443_access_ssl.log-20200607 contains an url that the system can't remove the base url from.
[2021-01-08 16:35:22] [Warning] The line number 33611 from the file /opt/ojs-files/usageStats/processing/logfile_443_access_ssl.log-20200607 contains an url that the system can't remove the base url from.
...
[2021-01-08 16:36:41] [Notice] File /opt/ojs-files/usageStats/processing/logfile_443_access_ssl.log-20200607 was processed and archived.
[2021-01-08 16:36:41] [Notice] Task process stopped.
It only contains a very small fraction of references to unprocessable lines of the entire logfile. So, that leaves me to conclude that the entire logfile, barring those few was processed entirely.
Background information:
- This installation runs OJS 3.1.2.1
- It’s an older installation that started out at 2.3.7.0 and got subsequent updates over the years according to the system information (I inherited this installation).
- We migrated from MySQL to PostgreSQL at the end of 2019.
We disabled the logging via usage events in early 2020 because it caused performance issues on the PostgreSQL server, understanding that we could import the apache logs in good order.
The goal here is to generate COUNTER statistics (XML download via OJS) for 2020 based on the apache logs.
=> Are there any subsequent steps I have to take after importing the apache log to complete the parsing of the log data?
Kind regards!