[OJS-3.4.0-4] Statistic problem

Hi @shantanusingh

You do not need to roll back the code.

Let me think about the error… At the moment I do not understand why is it happening…

Hi @shantanusingh

Could you double check if you have foreign keys in your DB table usage_stats_total_temporary_records? – e.g. in PHPMyAdmin, if you go to the table, then Structure, then Relation view. There should be 6 foreign keys, on the following columns: context_id, issue_galley_id, issue_id, representation_id, submission_file_id, submission_id.
So the foreign key constrain should already fail on that table… and not later when inserting into metrics_submission.

EDIT: also other temporary tables should have foreign keys…

Hi @bozana I have checked the journal statistics and found that use_events_20240131.log has produced statistics that were processed on a previous date.

use_events_20240201.log is processed and Archived but the statistics are not generated, maybe it will be generated the next day. I will take a look and update you.

I have checked the temporary DB and found all foreign keys in usage_stats_total_temporary_records.

2 foreign keys (issue_galley_id and issue_id) not found in usage_stats_unique_item_requests_temporary_records and usage_stats_unique_item_investigations_temporary_records.

I don’t know whether it is needed or not.

What do you mean?

use_events_20240201.log is processed and Archived but the statistics are not generated

It is so that first the file is parsed, the entries saved into the temporary tables, (currently) 2 jobs dispatched, and the file archived. Then on the next request (or two) the jobs will be run and statistics calculated and saved into the metrics tables.

Your foreign keys seem to be right. Hmmmm… I do not understand how comes that the error occurs first when saving into the metrics tables… – it should occur when inserting into the temporary tables and it would be ignored i.e. the row would not be inserted into the temporary tables and the processing would continue. Because that row would not be in the temporary tables it cannot cause any problems when saving the metrics from temporary tables into the metrics tables. But in your case it seems the row (that has a not existing galley/representation ID) is inserted into the temporary table – like the foreign key constraint is ignored there :open_mouth: and when it is then saved into the metrics tables the error occurs.

EDIT: Have you double checked that temporary tables are empty before re-processing the log file?

I checked in the morning and the statistics (usage_event_20240131.log) are as shown. I don’t know how this happened.

I have checked and re-processed use_events_20240201.log but it is processed and stored but statistics are not generated. It may be generated on the next processing day.

Yes, I have run the delete command for those tables.

DELETE FROM usage_stats_institution_temporary_records
DELETE FROM usage_stats_total_temporary_records
DELETE FROM usage_stats_unique_item_investigations_temporary_records
DELETE FROM usage_stats_unique_item_requests_temporary_records

Strange, strange… :slight_smile:

I do not know what is happening or what happened with 20240131 log file…

It seems you can see entries for usage_events_20240201.log in the metrics tables, i.e. in metrics_submission? If so, than it is OK. I can double check how the graph is working – it could be that it needs some time i.e. that yesterday is the last possible date…

Yes, :grinning: :rofl:

I don’t know what is happening, the statistics are for 20240131 or 20240201.

Maybe one more thing to double check: what is in the column date in the metrics_submission table where load_id = usage_events_20240201.log ? Is it 2024-02-01?

SELECT * FROM metrics_submission WHERE load_id = "usage_events_20240201.log";

Below is the 2024-02-01 log.

Below is the 2024-01-31 log.

Hi @shantanusingh

Yes, it seems to be OK – in the DB table screen shot for 20240201 I only see the assoc_type for files (515, 531), and the graph shows it too, when you switch to “files”. It is a little bit strange – it seems no abstract page was accessed on that day :open_mouth:
You could eventually double check in the log file for 20240201 itself, e.g. if there is any entry with "assocType":1048585.

And strangely for 20240131 there are only abstract pages (1048585) and no files :open_mouth: But maybe the saving failed and was not complete because of the foreign key error that you get.

Hi @bozana
I have checked and found "assocType":1048585 in log 20240201.

Hi @shantanusingh

Are those entries with “assocType”:1048585 in usage_events_20240201.log for that journal (that you sent the graph screenshot of)?

If so, something seems to be wrong with stats processing from that file as well :frowning:

I believe you would then need to somehow debug the processing:

  1. Be sure that the log file from yesterday has already been processed before starting with debugging/investigating – in order not to break the processing of that new log file and not to collect too much data in the temporary tables (which would lead to a timeout).
    Then:
  2. It would be good to see what is in the temporary usage tables once the scheduled task has been run, e.g. in usage_stats_total_temporary_records – if those entries look OK i.e. if they contain assoc_type = 1048585 for that journal.
    Because the scheduled task dispatches the jobs that will then be immediately run with the next request of an OJS web page, the entries in the usage stats temporary tables could be deleted quickly – after the job has run.
    Thus, it would be best to comment out the the following lines of the code (that remove the entries in the temporary tables) before re-processing the file usage_events_20240201.log:
    ojs/jobs/statistics/CompileUsageStatsFromTemporaryRecords.php at stable-3_4_0 · pkp/ojs · GitHub.
    But once you have seen what is in the temporary tables and/or finish the investigation do not forget to remove the comments from those lines (and definitely before a new log file is processed).
  3. During the whole processing watch the error logs, if any error occurs.

You could also do the same for 20240131 log file, to see why there is no insertion error for that galley ID that does not exist… But one after another…

Best,
Bozana

I have checked, the 20240204 log file is still in the process folder. It should have been completed in the morning but log file is still in the process folder. There are no failed jobs available.

Hi @bozana

[2024-02-05 06:10:09] https://epubs.icar.org.in/
[2024-02-05 06:10:09] [Notice] Task process started.


[2024-02-06 05:10:05] https://epubs.icar.org.in/
[2024-02-06 05:10:05] [Notice] Task process started.

As I see, statistical compilation has not been completed for the last two days. The statistics for 20240204 have not been prepared and the statistics for 20240205 have also not been generated.

Yesterday I saw that log file 20240204 was in the processing folder and today it is in the stage folder and log file 20240205 is in the processing folder.

As mentioned, I also did the manual process but the compilation is not completed, the log file is stuck in the processing folder.

No failed jobs in the fail queue.

I don’t know what’s happening.

Hi @shantanusingh

Do you see any errors in the PHP error log file? – It seems like the scheduled task cannot complete for a reason…

I have checked no error log reported.

Hmmm… strange…
Do you see any entries in the temporary stats tables?

Are you sure the last patch was applied successfully?
Do you see these two new lines in your PKPUsageStatsLoader: pkp/pkp-lib#9679 allow processing of the log files from the last month · pkp/pkp-lib@d0862af · GitHub ?
But somehow I think you would get/see an error if this would be wrong… Hmmm…

Can you try following:

  • move 20240205 temporary to archive folder
  • move 20240204 to usageEventLogs folder
  • clear temporary usage stats tables (all)
  • remove the row that contains APP\tasks\UsageStatsLoader from the DB table scheduled_tasks, e.g. by running:
    DELETE FROM scheduled_tasks where class_name = ‘APP\tasks\UsageStatsLoader’;
  • request an OJS page – the next request of your journal’s page will run the scheduled task
  • watch for errors in server’s PHP error log, is there any?
  • where is the log file now?
  • take a look in the scheduledTaskLogs folder, in the newest Usagestatisticsfileloadertask… file. What is there?
  • take a look in the temporary usage stats tables. What is there?

If that does not tell us anything:
Can you revert the last patch, those two changes from this link above, and then try the steps here again?

Hi @bozana

I have rolled back the last patch (pkp/pkp-lib#9679 allow processing of the log files from the last month · pkp/pkp-lib@d0862af · GitHub) and after reprocessing the log file is successfully executed and archived. Statistics have been generated for both the dates 20240204 and 20240205. I will look at the log file processing over the next 1-2 days and update you.

I think, if we remove the last patch we will again face problems in processing files on the last date of the month like last time.