Not that the first SubmissionFile object does not cause any problems, but the second creates the error. Is the reason the missing genre_id or what? Because I have 2000 rows of submission_files with genre_id set to null.
Hi @ajnyga, so the missing genre_id for submission_files is the problem. As far as I know, all files should have genre_id (i.e. I don’t know how come your files have genre_id = NULL), but it is actually not the requirement in the DB table schema, so maybe @asmecher knows the cases where the files could have genre_id = NULL? In that case I would have to consider that in the script.
What kind of files are those that miss the genre_id in your DB table? Are they all article full texts?
Maybe you can first only consider/investigate the galleys, because they are publicly available and making the problem here. Maybe first see if some of them is a supp or artwork file, i.e. something like: SELECT file_id FROM submission_filesWHERE genre_id = NULL AND assoc_type = 521 AND (file_id IN (SELECT file_id FROMsubmission_supplementary_files) OR file_id IN (SELECT file_id FROM submission_artwork_files))
Those that are not supp or artwork files are probably full texts then, right? Then you could use this SQL i.e. something like: UPDATE submission_files sf, genres g, submissions s SET sf.genre_id = g.genre_id WHERE sf.genre_id = NULL AND g.entry_key = 'SUBMISSION' AND g.context_id = s.context_id AND s.submission_id = sf.submission_id AND sf.assoc_type = 521 AND sf.file_id NOT IN (SELECT file_id FROM submission_supplementary_files) AND sf.file_id NOT IN (SELECT file_id FROM submission_artwork_files)
For those that are supp or artwork files maybe to choose an appropriate genre key instead of SUBMISSION.
I am not sure if this is related to the issue above, because in that issue the files should have the genre_id = 1 and not NULL.
hm, seems that most of them are reviewer reports. I will check what the one that was causing the error was, and whether it is actually a single case of galley file with a section_id null.
Is there a specific file_stage I should be looking at with these cases?
If you have <revision genre=""> in the import xml, apparently you end up with NULL values in the database…
But this is solved. I just checked the SUBMISSION genre_id for the journal and added those values to cases where genre_id was null and stage was 10. They all came from a single journal. We have only imported full texts so it was a safe bet. Thanks once again @bozana!
Ah, yes, that could be. Now, I think, such an import would not be possible – there would be the appropriate import result message (that the genre could not be found)…
Thanks a lot @ajnyga!
another error, this time on row 77 with the $articleFile->getGenreId(). I am debugging this and will let you know what it is. Two log files were already processed succesfully.
This is something different. @bozana, I would bet some money on the fact that you will get a lot of similar questions in the forum during the next few months
The submission_file in this case is missing altogether.
What happens with the current code in a scenario where:
A journal adds galley file A and publishes article. The file_id is saved in the usage stats log.
Before the logs are processed the journal removes the galley file A and adds galley file B
Scheduled tasks are run, the submission_file with the id mentioned in the logs can not be found => error on line 77, right?
edit: this scenario definitely needs fixing. I removed the references to the removed file and the log file got processed. But basically anyone could run into a similar problem.
yes, those did not turn out be the issue here. There were actually two separate cases. The first one was due to a bad xml import and the second one is the scenario Bozana has in the github issue above.
The second problem here was that some public galley/supp files did not have genre_id in the DB, due to the wrong import – this should be fixed manually in the DB.