Usagestats file loader task error after upgrade to 3.1.0.1

ajnyga · January 30, 2018, 12:15pm

Maybe @bozana could check this.

I noticed that after upgrading to OJS 3.1.0.1. I get an error in the middle of running the usage stats task. I added some debugging here ojs/UsageStatsLoader.inc.php at ojs-stable-3_1_0 · pkp/ojs · GitHub to print out the $articleFile object before the error:

[Tue Jan 30 13:59:34 2018] [error] [client 54.36.148.51] SubmissionFile Object\n(\n    [_data] => Array\n        (\n            [fileId] => 50054\n            [submissionLocale] => fi_FI\n            [revision] => 1\n            [assocType] => 521\n            [assocId] => 16578\n            [submissionId] => 53302\n            [fileStage] => 10\n            [originalFileName] => 53302-50053-1-CE.pdf\n            [filetype] => application/pdf\n            [genreId] => 457\n            [fileSize] => 3963948\n            [uploaderUserId] => 1\n            [userGroupId] => 750\n            [viewable] => \n            [dateUploaded] => 2015-11-17 20:38:50\n            [dateModified] => 2015-11-17 20:38:50\n            [name] => Array\n                (\n                    [fi_FI] => 53302-50054-PB\n                )\n\n        )\n\n    [_hasLoadableAdapters] => \n    [_metadataExtractionAdapters] => Array\n        (\n        )\n\n    [_extractionAdaptersLoaded] => \n    [_metadataInjectionAdapters] => Array\n        (\n        )\n\n    [_injectionAdaptersLoaded] => \n)\n

[Tue Jan 30 13:59:34 2018] [error] [client 54.36.148.51] SubmissionFile Object\n(\n    [_data] => Array\n        (\n            [fileId] => 84426\n            [submissionLocale] => fi_FI\n            [revision] => 1\n            [assocType] => 521\n            [assocId] => 29873\n            [submissionId] => 68578\n            [fileStage] => 10\n            [originalFileName] => filename.pdf.htm\n            [filetype] => text/html\n            [fileSize] => 51997\n            [uploaderUserId] => 1\n            [userGroupId] => 1409\n            [viewable] => \n            [dateUploaded] => 2017-12-15 20:40:47\n            [dateModified] => 2017-12-15 20:40:47\n            [name] => Array\n                (\n                    [fi_FI] => filename.pdf.htm\n                )\n\n        )\n\n    [_hasLoadableAdapters] => \n    [_metadataExtractionAdapters] => Array\n        (\n        )\n\n    [_extractionAdaptersLoaded] => \n    [_metadataInjectionAdapters] => Array\n        (\n        )\n\n    [_injectionAdaptersLoaded] => \n)\n

[Tue Jan 30 13:59:34 2018] [error] [client 54.36.148.51] PHP Fatal error:  Call to a member function getCategory() on null in /plugins/generic/usageStats/UsageStatsLoader.inc.php on line 79

Not that the first SubmissionFile object does not cause any problems, but the second creates the error. Is the reason the missing genre_id or what? Because I have 2000 rows of submission_files with genre_id set to null.

ajnyga · January 30, 2018, 12:23pm

@bozana I think this has to do with the bug where genre_id was set to NULL if the object was edited in 3.0.2? right? Fix genre assignment for upgrades · Issue #2506 · pkp/pkp-lib · GitHub

Do have a cool piece of sql to fix those 2000 rows of files?

bozana · January 30, 2018, 1:07pm

Hi @ajnyga, so the missing genre_id for submission_files is the problem. As far as I know, all files should have genre_id (i.e. I don’t know how come your files have genre_id = NULL), but it is actually not the requirement in the DB table schema, so maybe @asmecher knows the cases where the files could have genre_id = NULL? In that case I would have to consider that in the script.

What kind of files are those that miss the genre_id in your DB table? Are they all article full texts?
Maybe you can first only consider/investigate the galleys, because they are publicly available and making the problem here. Maybe first see if some of them is a supp or artwork file, i.e. something like:
SELECT file_id FROM submission_filesWHERE genre_id = NULL AND assoc_type = 521 AND (file_id IN (SELECT file_id FROMsubmission_supplementary_files) OR file_id IN (SELECT file_id FROM submission_artwork_files))
Those that are not supp or artwork files are probably full texts then, right? Then you could use this SQL i.e. something like:
UPDATE submission_files sf, genres g, submissions s SET sf.genre_id = g.genre_id WHERE sf.genre_id = NULL AND g.entry_key = 'SUBMISSION' AND g.context_id = s.context_id AND s.submission_id = sf.submission_id AND sf.assoc_type = 521 AND sf.file_id NOT IN (SELECT file_id FROM submission_supplementary_files) AND sf.file_id NOT IN (SELECT file_id FROM submission_artwork_files)
For those that are supp or artwork files maybe to choose an appropriate genre key instead of SUBMISSION.

I am not sure if this is related to the issue above, because in that issue the files should have the genre_id = 1 and not NULL.

Best,
Bozana

ajnyga · January 30, 2018, 1:21pm

hm, seems that most of them are reviewer reports. I will check what the one that was causing the error was, and whether it is actually a single case of galley file with a section_id null.

Is there a specific file_stage I should be looking at with these cases?

ajnyga · January 30, 2018, 1:56pm

@bozana, I think this is due to a bad xml import. The import was maybe missing a value for the genre attribute in revision-tag.

ajnyga · January 30, 2018, 2:31pm

If you have <revision genre=""> in the import xml, apparently you end up with NULL values in the database…

But this is solved. I just checked the SUBMISSION genre_id for the journal and added those values to cases where genre_id was null and stage was 10. They all came from a single journal. We have only imported full texts so it was a safe bet. Thanks once again @bozana!

bozana · January 30, 2018, 2:34pm

Ah, yes, that could be. Now, I think, such an import would not be possible – there would be the appropriate import result message (that the genre could not be found)…
Thanks a lot @ajnyga!

ajnyga · January 30, 2018, 3:03pm

another error, this time on row 77 with the $articleFile->getGenreId(). I am debugging this and will let you know what it is. Two log files were already processed succesfully.
This is something different. @bozana, I would bet some money on the fact that you will get a lot of similar questions in the forum during the next few months

ajnyga · January 30, 2018, 3:18pm

@bozana

The submission_file in this case is missing altogether.
What happens with the current code in a scenario where:

A journal adds galley file A and publishes article. The file_id is saved in the usage stats log.
Before the logs are processed the journal removes the galley file A and adds galley file B
Scheduled tasks are run, the submission_file with the id mentioned in the logs can not be found => error on line 77, right?

edit: this scenario definitely needs fixing. I removed the references to the removed file and the log file got processed. But basically anyone could run into a similar problem.

bozana · January 30, 2018, 4:33pm

Hi @ajnyga

Hmmm… I will then take a look how to just display the warning in the scheduled task log file but continue with the processing in such a case…

Thanks a lot!
Bozana

ajnyga · January 30, 2018, 4:47pm

Thanks, all the log files were processed now succesfully. There was only one such case.

bozana · January 30, 2018, 5:54pm

I created a GitHub Issue and provided a patch there: consider missing submission file in usage stats loader · Issue #3332 · pkp/pkp-lib · GitHub – As in other similar cases, I just break the switch statement i.e. continue with the next line in the log file…

Thanks @ajnyga!

asmecher · January 30, 2018, 7:18pm

Hi all,

I think reviewer reports legitimately don’t have a genre ID.

Regards,
Alec Smecher
Public Knowledge Project Team

ajnyga · January 30, 2018, 7:59pm

yes, those did not turn out be the issue here. There were actually two separate cases. The first one was due to a bad xml import and the second one is the scenario Bozana has in the github issue above.

bozana · January 30, 2018, 8:11pm

Great, then the requirement for the published files to have a genre_id is correct…
Thanks a lot to both of you, @ajnyga and @asmecher!
:-))) Bozana

Vitaliy · February 7, 2018, 9:28pm

So, if I have the similar problem, what should I do? This will be fixed in the next OJS release or there is a need for editing the database?

bozana · February 7, 2018, 10:10pm

Hi @Vitaliy

If the problem is that the file (that was logged once) does not exist any more, then there is a fix in this GitHub Issue: consider missing submission file in usage stats loader · Issue #3332 · pkp/pkp-lib · GitHub.

The second problem here was that some public galley/supp files did not have genre_id in the DB, due to the wrong import – this should be fixed manually in the DB.

Best,
Bozana

ajnyga · February 8, 2018, 6:01am

See above
Note that Fix genre assignment for upgrades · Issue #2506 · pkp/pkp-lib · GitHub was not related to this.

Vitaliy · February 8, 2018, 5:43pm

Hmm, I have checked the database, seems the error that we get after upgrade has different origin. I will open another topic for it.