Upgrade from OJS 3.0.1-1 to 3.1.1 takes a long time (1hr+)

We’re still in the testing phase, but our initial 2.x upgrade to 3.x took quite some time, which was expected.

Unfortunately, a simple point-version upgrade (3.0.1-1 to 3.1.1) ended up taking a significant amount of time as well. It seemed to hang on the submissions/submissionFiles schemas (i wish i could remember exactly which, sorry!) I let it run for an hour and then I had to leave – it was finished by the next morning.

We’re running a multi-journal install with a relatively large volume of of issues and articles. Is there a known change in the last release that would take a long time to go through – or are we stuck with significant down-time for 3.x upgrades? For reference, I had seen 2.x upgrades take in the range of 10 to 20 minutes, but never more than that.

Thanks in advance,

-Brian

Hi @bricas,

The OJS 2.x to 3.x update involves a lot of changes with respect to the file model, including a complete re-arrangement of the file storage area, so it’s legitimately possible that it takes a long time to run. I’d suggest timing a fairly representative test upgrade so that you know how long it’ll take (and have a fair sense that it’ll complete successfully). Then when you do the final upgrade, either take down the site in the meantime with a “come-back-later” notice, or add something very visible to indicate that any changes made in the interim may be lost.

Regards,
Alec Smecher
Public Knowledge Project Team

Alec,

I understand 2.x → 3.x could be a lengthy process – and it was.

However, a subsequent upgrade from 3.0.1-1 to 3.1.1 also took a significant amount of time, well over an hour at least.

I’m asking if there’s an issue for all 3.x → 3.x upgrades or was there something particular about 3.1.1 that would cause this.

-Brian

Hi @bricas,

Hmm, the 3.0.1-1 to 3.1.1 upgrade shouldn’t take that long – do you know what step (from the command-line output) consumed all that time?

Regards,
Alec Smecher
Public Knowledge Project Team

As noted above, it was one of the submissions or submissionFiles schemas – I want to say submissionFiles, but i neglected to note it when it happened, sorry.

Hi @bricas,

Hmm, that is unusual. The only way to really capture what’s happening is to try enabling a database query log. But I’d say that’s unusually slow, especially on a machine where the much more laborious 2.x to 3.x upgrade ran more quickly.

Regards,
Alec Smecher
Public Knowledge Project Team

I am running an upgrade test on a large multijournal installation (around 100 journals) from 2.4.6 to 3.1.1.

It took around 7 hours to get around here: ojs/upgrade.xml at ojs-stable-3_1_1 · pkp/ojs · GitHub

However, it is this query that is really taking time: ojs/3.1.1_update.xml at ojs-stable-3_1_1 · pkp/ojs · GitHub

It has been running for 70 000 seconds now. That is almost 20 hours.

edit: starting to think it will not be completed. Maybe the metrics table is simply too big for that query. Dont know. The process does not seem to be stopped when I check the processlist.

1 Like

Hi @ajnyga,

Try the SHOW PROCESSLIST; SQL query to see what queries are running – you can usually determine whether it’s actually making progress using that.

Depending on whether or not metrics/reporting are crucial, you can also truncate that table.

Regards,
Alec Smecher
Public Knowledge Project Team

Hi, I did, it is the query I highlited here: ojs/3.1.1_update.xml at ojs-stable-3_1_1 · pkp/ojs · GitHub

UPDATE metrics m, submission_file_settings sfs SET m.assoc_id = sfs.file_id WHERE m.assoc_type = 531 AND sfs.setting_name = ‘old-supp-id’ AND sfs.setting_value = m.assoc_id

The state of the query is “sending data” and the run time at the moment is 76 000 seconds.

Deleting the metrics data is really not an option, the journals want to keep their metrics.edit: or did you think of rebuilding the whole metrics table?

Ok so I ended the process and will try again.

The metrics table has 4235727 rows, there are 127 journals and 17500 submissions.

@bozana, it seems that the upgrade from OJS2 is now running the same really slow sql twice.

Is there a reason for that? It seems that those two queries take about 95% of the whole upgrade time.

Phew…
This seems to be the same as the post/issue here: ERROR: Upgrade failed: DB: MySQL server has gone away - #10 by celuloide

Unfortunately I do not have a right solution out of my head, but I will try to think about another way to solve it… I might need help… Whoever has an idea/suggestion…

:frowning:
Bozana

Hi @bozana, it is a similar query but not the exact same it seems.

But did you check the above question, I mean is it really needed to run that same query twice when updating from OJS2 to OJS3.1.1?

For solving the timeout error, I guess one option would be to use a script for the upgrade and a loop. It would mean a couple of million small queries but definitely no timeout…

What we did, and finally worked, was to copy the database into a VPS where we can set the mySQL timeout to a very large time. So once it finished successfully we could copy it back into the production server.

Hope this helps.
Regards.

Hi,

Yeah I am running the upgrade locally with XAMPP so I could of course let it run for two days if needed :smiley:

But I think that the double sql I mention above is a mistake and removing the other sql will cut down the upgrade time to half. I also think that the same sql is the reason why also the 3.0.1.1 to 3.1.1 upgrade takes so much time.

Hi @ajnyga

Hmmm… Yes, that query is double… Hmmm… Yes, I think it should be only once… I will try to figure out how come it is twice there…

EDIT: For this see: remove double SQL for metrics supp file migration · Issue #3597 · pkp/pkp-lib · GitHub

Thanks!
Bozana

2 Likes

Thanks, I guess that should only be for 3.x to 3.1.1 upgraders, but not sure which version of 3.

edit: ok, the way you fixed it makes sense, thanks!

1 Like

@bozana it seems that when I run the upgrade so that the query above is in 3.0.0_postupgrade_metrics.xml, the upgrade takes around 8 hours to complete. However, when I remove it from there and according the pull request you have there to 3.1.1_update.xml the upgrade takes a lot more time. That query alone has now been running for 14 hours. I am fairly sure it will a lot more hours because the last time I tried to run it in that phase, it took over 21 hours and I stopped the process.

The data is exactly the same.

So two question:

  1. does it matter in which phase you run it?
  2. why does it take a lot longer if you run it later? Is there maybe a index created to the metrics table in the meantime that affects this somehow?

edit: ok, so maybe there are just a lot more hits with that query https://github.com/pkp/ojs/blob/master/dbscripts/xml/upgrade/3.1.0_update.xml#L89-L90? So @bozana, is that something that is supposed to happen before this https://github.com/pkp/ojs/blob/master/dbscripts/xml/upgrade/3.1.1_update.xml#L32?

@bozana,
what if you would first have
UPDATE metrics m, submission_file_settings sfs SET m.assoc_id = sfs.file_id WHERE m.assoc_type = 515 AND sfs.setting_name = 'old-supp-id' AND sfs.setting_value = m.assoc_id
then
UPDATE metrics m, submission_file_settings sfs SET m.assoc_id = sfs.file_id WHERE m.assoc_type = 531 AND sfs.setting_name = 'old-supp-id' AND sfs.setting_value = m.assoc_id
and finally
UPDATE metrics SET assoc_type = 531 WHERE assoc_type = 515 AND assoc_id IN (SELECT file_id FROM submission_supplementary_files)
UPDATE metrics SET assoc_type = 531 WHERE assoc_type = 515 AND assoc_id IN (SELECT file_id FROM submission_artwork_files)

Would that mess something up? I think that mysql is struggling because running the two last queries first and the running the update with 531 has just so many hits. Maybe it will be faster in two separate smaller queries?