[OJS-3.4.0-4] Problem in processing statistics

Hi,

I have a problem processing access logs.
I have an infinite query running in the database (postgresql):


This print is more recent, after trying to restart the server.
I had queries like this trying to run for about 20 days without finishing.

It all started last month when we noticed that the acron plugin was disabled. There were around 20 log files in the folder files-ojs/usageStats/usageEventLogs/
I reactivated the acron plugin and all the files were processed and placed inside the archive folder.

I have 120 failed jobs:

Today the usage_stats_total_temporary_records table has 4793988 records.
What can be done to process this data? Before the processing is finished, new data enters and the processing never ends.

Based on this other topic [OJS-3.4.0-4] Statistic problem I thought about doing the following steps:

1- Clear all temporary tables

DELETE FROM usage_stats_institution_temporary_records
DELETE FROM usage_stats_total_temporary_records
DELETE FROM usage_stats_unique_item_investigations_temporary_records
DELETE FROM usage_stats_unique_item_requests_temporary_records

2- Clear the failed statistics jobs

3- Maybe add indexes to usage stats temporary tables pkp/pkp-lib#9627 Add indexes to usage stats temporary tables, conside… · pkp/pkp-lib@9f2123e · GitHub

4- Take all the files that were not processed and reprocess them 7 at a time. Trying to keep the total size of records in the table small.
As the processing is finished, I place 7 more files inside the usageEventLogs/ directory.

@bozana and @jonasraoni What do you think?

Best regards,
Tarcisio Pereira

Hi @Tarcisio_Pereira

It would actually be best to upgrade to the newest OJS 3.4 release – there were lots of improvements and fixes in between. For example these issues were all fixed for 3.4.0.5:

In 3.4.0.6:

In 3.4.0.7:

You might not need to apply all of them – I have just listed them as example for you to see what everything regarding usage stats processing has been improved/fixed in between.
I believe you would maybe need to apply all above fixes for 3.4.0.5.

But else, yes, that would be a way to proceed:

  • clear all usage_stats_…_temporary… DB tables
  • clear all failed usage stats jobs
  • you would probably need to apply all the patches listed above for the 3.4.0.5
  • if you do not keep the daily stats (you can see it in the UI under Administration > Site Settings > Statistics > Data Storage > Monthly or Daily Statistics) you would need to process the files for the whole last month (s. https://docs.pkp.sfu.ca/admin-guide/en/statistics#reprocess-log-files). Because this could take a long time maybe to first just check that everything is working correctly, e.g. with one log file from the current month. Also, keep an eye on the temporary usage stats tables – that they are empty after each usage stats job execution.
    If you keep the daily stats in your DB, you can process the log files independently, so maybe to start to process just one by one for several times and then try 7 or so.

Best,
Bozana

1 Like

Hi @bozana,

I’ll try and come back to let you know if it worked.

Regards,
Tarcisio Pereira

Hi @bozana

I have not applied any patches.
But it worked to process a one-day file.
However, the processing time is very long, it took about 20 hours to complete a file with 80 thousand lines.

Is this correct?
If I apply the patches, will the processing time decrease?

Best,
Tarcisio Pereira

Hi @bozana,

I applied the patches and it worked fine.
The files that used to take 20 hours to process now take a minute and a half.
They were all reprocessed and the situation is back to normal.
Thank you.

@Tarcisio_Pereira, great! :tada: :tada: :tada: :slight_smile:

1 Like