Usage statistic files are stuck in 'processing' folder

Hi,

We have been using OJS3 for the last couple of months with no issues; but, our usage statistic is not getting updated during the last month with no obvious reason.

I checked and there are many files in the ‘usageStats/processing’ folder. I moved them to ‘stage’ folder, removed the entry from DB and re-run the stat PHP code manually according to the manual.

file were again copied to ‘processing’ folder but still nothing happens after that and DB is not updated!!

Any ideas?!

Hi @alirezaaa

There will be corresponding log entries in the scheduledTaskLogs folder that will have an error message indicating the problem. Look for a file beginning with Usagestatisticsfileloadertask, which should end in a date.

Cheers,
Jason

Hi @jnugent,

Thank you for your reply.
I checked and it seems that the last successful log was processed on 2018-11-08

[2018-11-08 11:23:34] http://....

[2018-11-08 11:23:34] [Notice] Task process started.

[2018-11-08 11:23:36] [Notice] File /.../files/usageStats/processing/usage_events_20181107.log was processed and archived.

[2018-11-08 11:23:36] [Notice] Task process stopped.

Then on 2018-11-09 I see

[2018-11-09 11:24:00] http://....

[2018-11-09 11:24:00] [Notice] Task process started.

and since then, there is always an error until today:

[2018-11-10 11:26:49] http://...

[2018-11-10 11:26:49] [Notice] Task process started.

[2018-11-10 11:26:49] [Error] The directory /.../files/usageStats/processing is not empty. This could indicate a previously failed process, or a concurrently running process. This file will be automatically reprocessed if you are also using scheduledTasksAutoStage.xml, otherwise you will need to manually move any orphaned files in the processing directory back into the stage directory.

Should I copy back all archive files since 2018-11-08 to stage folder and re-run the job?

Regards,

Hi @alirezaaa

It sounds like there was a file on or around 11-08 or 11-09 that didn’t finish cleanly. If the 11-10 process is failing, try moving the 11-09 file out of the way temporarily, and then run the stats on the files after 11-09. You probably don’t need to rerun the older files (although it doesn’t hurt). If the new files go to completion (with an empty stage directory after) you can try running just the file that you moved, which failed the first time. It might just be a corrupt line or something which can be a bit of a pain to find.

Cheers,

Hi @jnugent,

It is about the latest part of the usage report line, for example:

...115.215.209 - - "2019-05-10 00:37:47" http://.../article/view/143 200 "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36"

If I edit it like this it works:

....115.215.209 - - "2019-05-10 00:37:47" http://..../article/view/143 200 ""

But I don’t see anything wrong in the last part and it should just work fine.

Any ideas?

Hi again,

Did you need to edit every entry in that file, or just one? The code parses the log files using a regular expression, and there might be a chance that the regular expression isn’t matching something or getting hung up.

Cheers,

Hi @jnugent,

Different lines should be edited in different files. But there are usually one line that need to be edited.

Is it ok if I edit all lines of all files and replace the last part with just ""?

Can you tell me which file has the corresponding regex?

Hi @alirezaaa

No, you shouldn’t strip out the User Agents - they are used by the UsageStats plugin for bot detection, so you don’t get false positives in your reports. The regex can be defined in the settings for the UsageStats plugin, or else see the UsageStatsLoader class and _getDataFromLogEntry() method within.

Cheers,

1 Like

Hi again @jnugent,

Thank you for your help. I edited some lines and ran the php code from terminal to re-process the archive files. Everything seem to work and files are moved to archived folder. The log file also says that files were processed and archived.

However, my problem is that database is not updated and there are no entries in the metrics table for the processed files. the last entry is from 2019-03-08.

I moved the log files from this date until today to stage folder and ran the code again. But, nothing is changed in DB.

what do you think I should do?

Hi @alirezaaa

If you’re not seeing new entries in the metrics table (you can tell by looking at the first column in the metrics table, as it corresponds to the file name of the processed usage stats file), then there may be a configuration issue. Usage stats are only processed and stored if the base_url parameter of the journal as indicated in the config.inc.php file exactly matches the URL in the stats file, right down to the protocol. Can you confirm that those are the same?

Cheers,

Hi @jnugent,

This is our case:

base_url = "http://abc.com"

and usagestat files are like:

...115.215.209 - - "2019-05-10 00:37:47" http://abc.com/article/view/143 200 "Mozilla/5.0"

So, I guess the config is correct.

Interesting thing has happened. I checked the metrics table now and the last entry is from 20181229 . I am pretty sure that the last entry was from 20190308 yesterday. It seems some rows are deleted after I moved the archived files to stage folder and re-ran the code yesterday.

Any ideas?!

Hey,

Did anything change with your OJS installation recently? Did you upgrade at all?

It would be interesting to see what happens if you created a brand new OJS installation on a local test machine, put all of the archived stat files in the stage folder and see what you end up with. It’s strange to me that it sounds like you’ve lost everything from 2019, if your most recent record is from 20181229.

How did you rerun the processing?

Cheers,

Hi @jnugent,

I am still stuck with this problem.

  1. I copied the files from archive folder to stage folder

  2. deleted the entry from scheduled_rask table : plugins.generic.usageStats.UsageStatsLoader

  3. run the command:
    php tools/runScheduledTasks.php lib/pkp/plugins/generic/usageStats/scheduledTasks.xml

files are moved from stage to archive folder, but the metrics table does not change.

I checked file permissions and acron plugin and they were fine.

some log files are owned by root and some by apache. I don`t think if this is a problem?

by the way, there are some 300-400 old rows in usage_stats_temporary_records table. Is this normal?

Best,

Hi @alirezaaa

No, there shouldn’t be records in the temp table unless a stats process is running.

The files should be owned by the user your webserver or PHP script runs as. If things are owned by root, you might get into permissions issues if you need to read or delete them.

If you’re still not getting new records in the metrics table, I suggest making a backup of the table with a mysqldump command and then deleting a specific stats log by eliminating records based on the load_id and then importing that specific file again. Things should go in, if the base_url and the url in the stats log are the same.

Cheers,

Hi @jnugent,

I gave up!

I truncated metrics and usage_stats_temporary_records tables and copied some files back to stage folder from archives.

Files are processed and moved to archives , there are some records in usage_stats_temporary_records table, but nothing happens to metrics.

base_url is correct.

Is there a script to copy from usage_stats_temporary_records table to metrics?

Regards,

Are you getting any PHP errors when you run the import? Anything if you set debug = On in your config file? Anything to indicate a database schema mismatch or something?

There’s no script to copy records from the temp table to the metrics one, unfortunately, but the fact that things aren’t going into the metrics table is weird, if everything is the same as far as the base url goes.

Hi @jnugent,

I found that there would be no problem in parsing usage_events log files if I delete all lines with the word download in it!

I spent a couple hours to figure out why this happens with no success.

Any ideas?

whether the problem is solved, I have the same case