Upgrade from OJS 3.0.1-1 to 3.1.1 takes a long time (1hr+)

As noted above, it was one of the submissions or submissionFiles schemas – I want to say submissionFiles, but i neglected to note it when it happened, sorry.

Hi @bricas,

Hmm, that is unusual. The only way to really capture what’s happening is to try enabling a database query log. But I’d say that’s unusually slow, especially on a machine where the much more laborious 2.x to 3.x upgrade ran more quickly.

Regards,
Alec Smecher
Public Knowledge Project Team

I am running an upgrade test on a large multijournal installation (around 100 journals) from 2.4.6 to 3.1.1.

It took around 7 hours to get around here: ojs/upgrade.xml at ojs-stable-3_1_1 · pkp/ojs · GitHub

However, it is this query that is really taking time: ojs/3.1.1_update.xml at ojs-stable-3_1_1 · pkp/ojs · GitHub

It has been running for 70 000 seconds now. That is almost 20 hours.

edit: starting to think it will not be completed. Maybe the metrics table is simply too big for that query. Dont know. The process does not seem to be stopped when I check the processlist.

1 Like

Hi @ajnyga,

Try the SHOW PROCESSLIST; SQL query to see what queries are running – you can usually determine whether it’s actually making progress using that.

Depending on whether or not metrics/reporting are crucial, you can also truncate that table.

Regards,
Alec Smecher
Public Knowledge Project Team

Hi, I did, it is the query I highlited here: ojs/3.1.1_update.xml at ojs-stable-3_1_1 · pkp/ojs · GitHub

UPDATE metrics m, submission_file_settings sfs SET m.assoc_id = sfs.file_id WHERE m.assoc_type = 531 AND sfs.setting_name = ‘old-supp-id’ AND sfs.setting_value = m.assoc_id

The state of the query is “sending data” and the run time at the moment is 76 000 seconds.

Deleting the metrics data is really not an option, the journals want to keep their metrics.edit: or did you think of rebuilding the whole metrics table?

Ok so I ended the process and will try again.

The metrics table has 4235727 rows, there are 127 journals and 17500 submissions.

@bozana, it seems that the upgrade from OJS2 is now running the same really slow sql twice.

Is there a reason for that? It seems that those two queries take about 95% of the whole upgrade time.

Phew…
This seems to be the same as the post/issue here: ERROR: Upgrade failed: DB: MySQL server has gone away - #10 by celuloide

Unfortunately I do not have a right solution out of my head, but I will try to think about another way to solve it… I might need help… Whoever has an idea/suggestion…

:frowning:
Bozana

Hi @bozana, it is a similar query but not the exact same it seems.

But did you check the above question, I mean is it really needed to run that same query twice when updating from OJS2 to OJS3.1.1?

For solving the timeout error, I guess one option would be to use a script for the upgrade and a loop. It would mean a couple of million small queries but definitely no timeout…

What we did, and finally worked, was to copy the database into a VPS where we can set the mySQL timeout to a very large time. So once it finished successfully we could copy it back into the production server.

Hope this helps.
Regards.

Hi,

Yeah I am running the upgrade locally with XAMPP so I could of course let it run for two days if needed :smiley:

But I think that the double sql I mention above is a mistake and removing the other sql will cut down the upgrade time to half. I also think that the same sql is the reason why also the 3.0.1.1 to 3.1.1 upgrade takes so much time.

Hi @ajnyga

Hmmm… Yes, that query is double… Hmmm… Yes, I think it should be only once… I will try to figure out how come it is twice there…

EDIT: For this see: remove double SQL for metrics supp file migration · Issue #3597 · pkp/pkp-lib · GitHub

Thanks!
Bozana

2 Likes

Thanks, I guess that should only be for 3.x to 3.1.1 upgraders, but not sure which version of 3.

edit: ok, the way you fixed it makes sense, thanks!

1 Like

@bozana it seems that when I run the upgrade so that the query above is in 3.0.0_postupgrade_metrics.xml, the upgrade takes around 8 hours to complete. However, when I remove it from there and according the pull request you have there to 3.1.1_update.xml the upgrade takes a lot more time. That query alone has now been running for 14 hours. I am fairly sure it will a lot more hours because the last time I tried to run it in that phase, it took over 21 hours and I stopped the process.

The data is exactly the same.

So two question:

  1. does it matter in which phase you run it?
  2. why does it take a lot longer if you run it later? Is there maybe a index created to the metrics table in the meantime that affects this somehow?

edit: ok, so maybe there are just a lot more hits with that query https://github.com/pkp/ojs/blob/master/dbscripts/xml/upgrade/3.1.0_update.xml#L89-L90? So @bozana, is that something that is supposed to happen before this https://github.com/pkp/ojs/blob/master/dbscripts/xml/upgrade/3.1.1_update.xml#L32?

@bozana,
what if you would first have
UPDATE metrics m, submission_file_settings sfs SET m.assoc_id = sfs.file_id WHERE m.assoc_type = 515 AND sfs.setting_name = 'old-supp-id' AND sfs.setting_value = m.assoc_id
then
UPDATE metrics m, submission_file_settings sfs SET m.assoc_id = sfs.file_id WHERE m.assoc_type = 531 AND sfs.setting_name = 'old-supp-id' AND sfs.setting_value = m.assoc_id
and finally
UPDATE metrics SET assoc_type = 531 WHERE assoc_type = 515 AND assoc_id IN (SELECT file_id FROM submission_supplementary_files)
UPDATE metrics SET assoc_type = 531 WHERE assoc_type = 515 AND assoc_id IN (SELECT file_id FROM submission_artwork_files)

Would that mess something up? I think that mysql is struggling because running the two last queries first and the running the update with 531 has just so many hits. Maybe it will be faster in two separate smaller queries?

Hi @ajnyga

Ah, I think I am getting crazy from those migrations… :crazy_face: – It is so difficult to keep track and figure out when what has to be fixed and how and… :crazy_face: THANKS A LOT for thinking with me! :slight_smile:

So, I think: for the upgrade 3.1.1 I think that the lines https://github.com/pkp/ojs/blob/master/dbscripts/xml/upgrade/3.1.0_update.xml#L89-L90 would be necessary only if one is upgrading from an older OJS 3.0.x, because for those coming from OJS 2 this is already set correctly here https://github.com/pkp/ojs/blob/master/dbscripts/xml/upgrade/3.0.0_postupgrade_metrics.xml#L19.
That line was necessary earlier, when the 3.0.0_postupgrade_metrics.xml was wrong too.
But, however, I think this two SQLs do not last long, or?

And else, regarding your suggestion for having 3 smaller SQLs: I am not sure, I would need to try… The logic seems to be correct.
Would it be possible for you to test on your big data how long these 3 queries take?

Or maybe, after these lines https://github.com/pkp/ojs/blob/master/dbscripts/xml/upgrade/3.1.0_update.xml#L89-L90 are run, would it be possible to see how long these queries take:
SELECT m.assoc_id FROM metrics m, submission_file_settings sfs WHERE m.assoc_type = 531 AND sfs.setting_name = 'old-supp-id' AND sfs.setting_value = m.assoc_id
comparing to the:
UPDATE metrics m, submission_file_settings sfs SET m.assoc_id = sfs.file_id WHERE m.assoc_type = 531 AND sfs.setting_name = 'old-supp-id' AND sfs.setting_value = m.assoc_id
i.e. if maybe this huge/complicated UPDATE is making problems… ?
Hmmm…
Or would it be possible for you to give me those 3-4 DB tables (metrics, submission_file_settings, submission_supplementary_files and submission_artwork_files) from you so that I can test it?

:thinking::crazy_face:

I will first see if the current upgrade script ever finishes. The query has now run for 72 000 seconds. Let’s see if it is finished by tomorrow morning :joy: But that would basically mean that the upgrade runs over 30 hours and I think that is just too much even with large installations.

After that I can test the queries you did with the data I have.

@bozana, it took between 22 to 28 hours to run that query (not sure when it finished because I was sleeping).

So the full upgrade took something like 30-32 hours.

After thinking about this, I think that I will do the actual production upgrade with the current script. It takes long, but works. I can test the queries you had anyway to get you the numbers in case some even larger installation needs other solutions.

Thanks!

Omg… :hushed:
Hmmm… :thinking: