Usagestats file loader task error after upgrade to 3.1.0.1

Maybe @bozana could check this.

I noticed that after upgrading to OJS 3.1.0.1. I get an error in the middle of running the usage stats task. I added some debugging here ojs/UsageStatsLoader.inc.php at ojs-stable-3_1_0 · pkp/ojs · GitHub to print out the $articleFile object before the error:

[Tue Jan 30 13:59:34 2018] [error] [client 54.36.148.51] SubmissionFile Object\n(\n    [_data] => Array\n        (\n            [fileId] => 50054\n            [submissionLocale] => fi_FI\n            [revision] => 1\n            [assocType] => 521\n            [assocId] => 16578\n            [submissionId] => 53302\n            [fileStage] => 10\n            [originalFileName] => 53302-50053-1-CE.pdf\n            [filetype] => application/pdf\n            [genreId] => 457\n            [fileSize] => 3963948\n            [uploaderUserId] => 1\n            [userGroupId] => 750\n            [viewable] => \n            [dateUploaded] => 2015-11-17 20:38:50\n            [dateModified] => 2015-11-17 20:38:50\n            [name] => Array\n                (\n                    [fi_FI] => 53302-50054-PB\n                )\n\n        )\n\n    [_hasLoadableAdapters] => \n    [_metadataExtractionAdapters] => Array\n        (\n        )\n\n    [_extractionAdaptersLoaded] => \n    [_metadataInjectionAdapters] => Array\n        (\n        )\n\n    [_injectionAdaptersLoaded] => \n)\n

[Tue Jan 30 13:59:34 2018] [error] [client 54.36.148.51] SubmissionFile Object\n(\n    [_data] => Array\n        (\n            [fileId] => 84426\n            [submissionLocale] => fi_FI\n            [revision] => 1\n            [assocType] => 521\n            [assocId] => 29873\n            [submissionId] => 68578\n            [fileStage] => 10\n            [originalFileName] => filename.pdf.htm\n            [filetype] => text/html\n            [fileSize] => 51997\n            [uploaderUserId] => 1\n            [userGroupId] => 1409\n            [viewable] => \n            [dateUploaded] => 2017-12-15 20:40:47\n            [dateModified] => 2017-12-15 20:40:47\n            [name] => Array\n                (\n                    [fi_FI] => filename.pdf.htm\n                )\n\n        )\n\n    [_hasLoadableAdapters] => \n    [_metadataExtractionAdapters] => Array\n        (\n        )\n\n    [_extractionAdaptersLoaded] => \n    [_metadataInjectionAdapters] => Array\n        (\n        )\n\n    [_injectionAdaptersLoaded] => \n)\n

[Tue Jan 30 13:59:34 2018] [error] [client 54.36.148.51] PHP Fatal error:  Call to a member function getCategory() on null in /plugins/generic/usageStats/UsageStatsLoader.inc.php on line 79

Not that the first SubmissionFile object does not cause any problems, but the second creates the error. Is the reason the missing genre_id or what? Because I have 2000 rows of submission_files with genre_id set to null.

@bozana I think this has to do with the bug where genre_id was set to NULL if the object was edited in 3.0.2? right? Fix genre assignment for upgrades · Issue #2506 · pkp/pkp-lib · GitHub

Do have a cool piece of sql to fix those 2000 rows of files?

Hi @ajnyga, so the missing genre_id for submission_files is the problem. As far as I know, all files should have genre_id (i.e. I don’t know how come your files have genre_id = NULL), but it is actually not the requirement in the DB table schema, so maybe @asmecher knows the cases where the files could have genre_id = NULL? In that case I would have to consider that in the script.

What kind of files are those that miss the genre_id in your DB table? Are they all article full texts?
Maybe you can first only consider/investigate the galleys, because they are publicly available and making the problem here. Maybe first see if some of them is a supp or artwork file, i.e. something like:
SELECT file_id FROM submission_filesWHERE genre_id = NULL AND assoc_type = 521 AND (file_id IN (SELECT file_id FROMsubmission_supplementary_files) OR file_id IN (SELECT file_id FROM submission_artwork_files))
Those that are not supp or artwork files are probably full texts then, right? Then you could use this SQL i.e. something like:
UPDATE submission_files sf, genres g, submissions s SET sf.genre_id = g.genre_id WHERE sf.genre_id = NULL AND g.entry_key = 'SUBMISSION' AND g.context_id = s.context_id AND s.submission_id = sf.submission_id AND sf.assoc_type = 521 AND sf.file_id NOT IN (SELECT file_id FROM submission_supplementary_files) AND sf.file_id NOT IN (SELECT file_id FROM submission_artwork_files)
For those that are supp or artwork files maybe to choose an appropriate genre key instead of SUBMISSION.

I am not sure if this is related to the issue above, because in that issue the files should have the genre_id = 1 and not NULL.

Best,
Bozana

hm, seems that most of them are reviewer reports. I will check what the one that was causing the error was, and whether it is actually a single case of galley file with a section_id null.

Is there a specific file_stage I should be looking at with these cases?

@bozana, I think this is due to a bad xml import. The import was maybe missing a value for the genre attribute in revision-tag.

If you have <revision genre=""> in the import xml, apparently you end up with NULL values in the database…

But this is solved. I just checked the SUBMISSION genre_id for the journal and added those values to cases where genre_id was null and stage was 10. They all came from a single journal. We have only imported full texts so it was a safe bet. Thanks once again @bozana!

Ah, yes, that could be. Now, I think, such an import would not be possible – there would be the appropriate import result message (that the genre could not be found)…
Thanks a lot @ajnyga!

another error, this time on row 77 with the $articleFile->getGenreId(). I am debugging this and will let you know what it is. Two log files were already processed succesfully.
This is something different. @bozana, I would bet some money on the fact that you will get a lot of similar questions in the forum during the next few months :smiley:

@bozana

The submission_file in this case is missing altogether.
What happens with the current code in a scenario where:

  1. A journal adds galley file A and publishes article. The file_id is saved in the usage stats log.
  2. Before the logs are processed the journal removes the galley file A and adds galley file B
  3. Scheduled tasks are run, the submission_file with the id mentioned in the logs can not be found => error on line 77, right?

edit: this scenario definitely needs fixing. I removed the references to the removed file and the log file got processed. But basically anyone could run into a similar problem.

1 Like

Hi @ajnyga

Hmmm… I will then take a look how to just display the warning in the scheduled task log file but continue with the processing in such a case…

Thanks a lot!
Bozana

Thanks, all the log files were processed now succesfully. There was only one such case.

I created a GitHub Issue and provided a patch there: consider missing submission file in usage stats loader · Issue #3332 · pkp/pkp-lib · GitHub – As in other similar cases, I just break the switch statement i.e. continue with the next line in the log file…

Thanks @ajnyga!

2 Likes

Hi all,

I think reviewer reports legitimately don’t have a genre ID.

Regards,
Alec Smecher
Public Knowledge Project Team

yes, those did not turn out be the issue here. There were actually two separate cases. The first one was due to a bad xml import and the second one is the scenario Bozana has in the github issue above.

Great, then the requirement for the published files to have a genre_id is correct…
Thanks a lot to both of you, @ajnyga and @asmecher!
:-))) Bozana

So, if I have the similar problem, what should I do? This will be fixed in the next OJS release or there is a need for editing the database?

Hi @Vitaliy

If the problem is that the file (that was logged once) does not exist any more, then there is a fix in this GitHub Issue: consider missing submission file in usage stats loader · Issue #3332 · pkp/pkp-lib · GitHub.

The second problem here was that some public galley/supp files did not have genre_id in the DB, due to the wrong import – this should be fixed manually in the DB.

Best,
Bozana

See above :smiley:
Note that Fix genre assignment for upgrades · Issue #2506 · pkp/pkp-lib · GitHub was not related to this.

Hmm, I have checked the database, seems the error that we get after upgrade has different origin. I will open another topic for it.