OJS Statistics plugin usage and regex

Dear all,

I have some issue with the Usage Statiscs Plugin.
First, statistic logs are generated from OJS. And the data seem to be processed and read into the database, since the metrics table is well filled.

However, clicking in the OJS backend on Statistics → View Report. There are all over only zeros for all article and galley counts, although in the database there are higher numbers. This applies to all my journals. Some at OJS 3.1.1.4, others at higher versions.

Furthermore, in my investigations I found the usage Plugin regex for the logs to be quite flawed.
I tested the regex at regex101.com . Giving it first the normal plugin regex:

/^(?P<ip>\S+) \S+ \S+ \[(?P<date>.*?)\] "\S+ (?P<url>\S+).*?" (?P<returnCode>\S+) \S+ ".*?" "(?P<userAgent>.*?)"/ (please note that the leading and trailing slashes are removed by the regex101.com website because it “knows” they are there.)
and some data:

4aae775ba286614662eb69572250572641201db19313d7bb12c2cf0d81701096 - - "2019-12-13 08:02:29" https://myojs.de/testling-3114/index.php/dark-universe/index 200 "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:71.0) Gecko/20100101 Firefox/71.0"

That does not give ANY result. Hence, I am suprised that there are even metric numbers in the database. I know that the regex is supposed to work with both Apache logs and the OJS statistics logs, but at least for the OJS statistic logs it does not work…
In my impression, the regex has multiple flaws. So I worked out a working regex:

/^(?P<ip>\S+) \S+ \S+ "(?P<date>.*?)" (?P<url>\S+).*? (?P<returnCode>\S+) "(?P<userAgent>.*?)"/

Could please a developer tell me what is going on. Why are the numbers from the database not put into the View report and why are there numbers in the database although the regex seems to be flawed?

Cheers,

Adrian

The regex will only be used if you are processing external log files (like Apache and Nginx logs) rather than the internal usage statistics logs generated within OJS.

So, if you are using the internal usage statistics logs, you’ll want to look in the files_dir directory under the the “usageStats” directory. You will expect the internal logs stored in “usageEventLogs” to progress through the “stage”, “processing” and “archive” directories via the Scheduled Task “scheduledTasksAutoStage.xml”. This requires a cron job, or the Acron plugin to be active.

If you are using external log files from your webserver, you will need to adjust the regexp to your log file format manually (by default it is configured for the Apache “combined” log format), and then move the log files into the “stage” directory for processing by the “scheduledTasksExternalLogFiles.xml”. This requires a cron job.

In either case, you will find logs of past processing actions in your “scheduledTaskLogs” folder in files_dir.

This is described more fully in the documentation here:
https://docs.pkp.sfu.ca/admin-guide/en/statistics#appendix-c-processing-log-files

Thank you for the insight!

That explains the regex issue.

However, what about the statistics report for the editors? From the given link, I learned that the “View Report” is only usable for legacy data. This may explain why there are only zeros in this report. But there is no " Usage Statistics Report" to choose from in the Statistics Menu, as described in the documentation (Statistics). Hence, where can my editors get their numbers from?

Thanks a lot! :slight_smile:

The legacy “Views Report” was replaced by COUNTER-ready reporting some years ago. The underlying COUNTER-ready data feeds multiple reports, including the PKP Usage Statistics Report, the COUNTER reports, and the View Report options from within the Statistics Menu. In 3.1.2, this should look something like:
image

I recommend checking your scheduled task logs to see if processing for usage statistics is happening as you expect.

Editorial reports are under current development for 3.2. You can see a preview of what this will look like here:
https://github.com/pkp/pkp-lib/issues/4844#issuecomment-523395401

1 Like

You just saved me with this.
Our statistics stopped working months ago, in despite of no error shown.
I copied one line from our log and tried to match the regex (which is the default one… nobody messed with it) and, for my surprise, it didn’t work!
And it’s the default Apache LogFormat, we didn’t mess with it either.
I googled if anyone got another regex and yours came up. Changed the config yesterday, moved the logs to stage folder and today everything is working again! :smiley:

P.s.: I did try to just moving without changing the regex, and it didn’t work. Logs just got archived without any new statistics.

1 Like

Cool! Good to hear that I am not alone with my problems and that others can benefit from my “research”. ^^

1 Like