UsageStats - Strange i18n behaviour and missing data

Description of issue or problem I’m having:
UsageStats stopped working as of Dec 31 2021, last data collection for 2021 was done on Dec 30 2021 (archive/ stops there). For Jan 4 2022 only partial data was loaded, nothing else (not in archive/).

Steps I took leading up to the issue:
We have read several forum threads.
We have looked at the metrics and scheduled_tasks tables in the database. We have done a thorough analysis of the data in the filesystem and found things like this:

$ ls -1 scheduledTaskLogs/*20211231.log
scheduledTaskLogs/##admin.scheduledTask.statisticsReport##-61ce3a0214593-20211231.log
scheduledTaskLogs/Crossrefautomaticregistrationtask-61cf727d7c2f1-20211231.log
scheduledTaskLogs/DataCiteautomatischeRegistrierung-61cf726bbde9d-20211231.log
scheduledTaskLogs/DOAJautomatischeRegistrierung-61cf71dcdba5c-20211231.log
scheduledTaskLogs/ErinnerunganGutachten-61cf72f3b7ae2-20211231.log
scheduledTaskLogs/mEDRAautomatischeRegistrierung-61cf722d1faec-20211231.log
scheduledTaskLogs/PKPPNAblieferungsaufgabe-61cf1f31aa5ac-20211231.log
scheduledTaskLogs/SubscriptionExpiryReminder-61ce3a06074aa-20211231.log
scheduledTaskLogs/TaskzumLadenderNutzungsstatistik-Datei-61cf713f98a92-20211231.log

The “TaskzumLadenderNutzungsstatistik” sometimes is called “Usagestatisticsfileloadertask” instead.

We have several “##admin[…]” entries, 17 to be exact. Looks a lot like beginning and end of the month are special?

ls -1 scheduledTaskLogs/##admin.scheduledTask.statisticsReport##-*
scheduledTaskLogs/##admin.scheduledTask.statisticsReport##-5fbe8245dc2d8-20201125.log
scheduledTaskLogs/##admin.scheduledTask.statisticsReport##-603e66923eff7-20210302.log
scheduledTaskLogs/##admin.scheduledTask.statisticsReport##-6063428ab1005-20210330.log
scheduledTaskLogs/##admin.scheduledTask.statisticsReport##-6064f0f2be63a-20210401.log
scheduledTaskLogs/##admin.scheduledTask.statisticsReport##-608c7de10b0f2-20210501.log
scheduledTaskLogs/##admin.scheduledTask.statisticsReport##-60b40ae31d2a7-20210531.log
scheduledTaskLogs/##admin.scheduledTask.statisticsReport##-60b55c63b561a-20210601.log
scheduledTaskLogs/##admin.scheduledTask.statisticsReport##-60dce961a629e-20210701.log
scheduledTaskLogs/##admin.scheduledTask.statisticsReport##-61047662e2d77-20210731.log
scheduledTaskLogs/##admin.scheduledTask.statisticsReport##-6105c7f4654d1-20210801.log
scheduledTaskLogs/##admin.scheduledTask.statisticsReport##-612ea66ce0d51-20210901.log
scheduledTaskLogs/##admin.scheduledTask.statisticsReport##-615633618786d-20211001.log
scheduledTaskLogs/##admin.scheduledTask.statisticsReport##-617dc066094a7-20211031.log
scheduledTaskLogs/##admin.scheduledTask.statisticsReport##-617f20008e131-20211101.log
scheduledTaskLogs/##admin.scheduledTask.statisticsReport##-61a6acf9dd13c-20211201.log
scheduledTaskLogs/##admin.scheduledTask.statisticsReport##-61ce3a0214593-20211231.log
scheduledTaskLogs/##admin.scheduledTask.statisticsReport##-61cf8b8293248-20220101.log

We also see a lot of data missing in archive/ which seems to be stuck in stage/.

What I tried to resolve the issue:

We moved one file for Jan 04 2022 from processing/ to stage/. We then ran runScheduledTasks manually on the commandline, to no avail.

Application Version - e.g., OJS 3.1.2:

OJS 3.2.1-3
Usage Stats 1.0

Additional information, such as screenshots and error log messages if applicable:

We have only seen the error message on data being in processing/, which is why we moved one file to stage/.

Additional remarks:

  • Why are the file names translated? Obviously this is very error-prone and makes it harder to debug things. Searching for translated error messages means I will have to translate them back to English, because searching in random language (in our case German) won’t give me any results.
  • Missing documention of this plugin is also a major concern for us. I have not found any information on where the data comes from, where it first starts, what all the directories are meant for and how to handle data that seems to be in the wrong place (except for the error message we saw in the log file). Describing the process in more detail anywhere would be much appreciated!

We have now moved all files from both stage/ and processing/ to a different location. We still have no messages in either scheduledTaskLogs nor the webserver logs that would indicate what is wrong or what we can do to amend this situation.

We are hoping that moving the old files away from stage will show us whether or not the currently produced new files are passing the process. If they do, we might add the old files little by little. If not, we do not have the slightest idea where to start debugging this mess.

(Gabriele)

Quick update: Yesterday’s stats were successfully processed and archived. We will now leave things as they are for the weekend to see if future days are also successfully processed. After that we might start including the old data from stage file by file and see if maybe the sheer amount of files in stage was the issue or if one specific file lead to the issues.

I am still underwhelmed by the logging behaviour: We see neither any info on what went wrong in scheduledTaskLogs nor in the PHP logs (at least not when grepping for the libs-pkp-path, as we have limited possibilities here).
(Gabriele)

Process has been running without issues for a few days (Jan 12-15 2022). We will now add the old files we moved away from stage one by one. Prior to that we will remove all lines - if any - that do not match the regular expression as per the plugin’s default settings (looks like a low amount of lines compared to the general file size).
(Gabriele)

We have seen another file fail last week and are now considering to reprocess the failing files after the next OJS upgrade. This is really not at all satisfactory, but the way the plugin behaves for us we don’t see another option currently.

Our last theory was that the regex is failing on some files and that reverse-grepping might help. But some files fail even though all lines match the regex.

@asmecher do you have any idea what can be done in this situation or who to ask?

(Gabriele)

Why are the file names translated?

You probably have Acron plugin doing the scheduled tasks. This means that a visit to any of the journals in your installation can trigger the scheduled task for processing the usageStats file and the language of that journal will determine the language of the log file.

We see the same thing in Journal.fi

It seems you have an installation with more than one journal? What kind of setup do you have? Are your journals under the same domain or have some custom setup?

What you write sounds reasonable (as in that’s how the system might work), but also disconcerting (that’s definitely not the way we want it to work). We definitely do not speak all the languages of the journals we host. In the scenario you described we will at some point not be able to read the log files.

Yes, we have a multijournal setup with more than one domain.

Thank you very much for getting back to us!
(Gabriele)

The Default Translation plugin (available in gallery) solves this case too, or just similar problems in the web interface?

I think that by default loading the English text in the absence of the translated term tends to be the best option for PKP applications.

Default Translation Plugin is on in our instance to my knowledge. I am not sure if all journals have it activated.

For server logfiles I would seriously expect English only. Translating those leads to the issues mentioned (strings in other languages can’t be searched as easily, sysadmin*as might not speak the languages the journals have activated).

(Gabriele)

I agree that it is a challenge. Even with more frequent changes in the translations than in the original terms, at least in our experience with journals in Portuguese.

Maybe it could at least be configurable so it could work as you indicated.

Personally I think changing file names depending on “which language was used when some arbitrary page was opened”, is the worst. I can not, as of now, make things like grep in all log files produced by usageStats. Except if I invest the time to find out all possible translations of the file name (!!) and which placeholder string it has, in case there is no translation available.
(Gabriele)

Using cron instead of acron plugin to run scheduled tasks doesn’t solve this?

Thank you for your suggestion! We would need to make sure that this would satisfy all jobs that are currently done via acron. I remember one of my colleagues saying they installed acron because one of our plugins required it. Depends on what exactly “require” means in this case. I will have to check which plugin that was and whether it can work with cron as well.
(Gabriele)