[ojs] Export and remove stats before upgrade (2.x -> 3.x)

We’re getting very close to pulling the trigger on our 2.x to 3.x OJS upgrade.

I’ve noted (and posted about it before) that the upgrade process can take along time – likely, at least partially, due to all of our historical stats (~9.5 mil rows in our metrics table).

We thought this might be a good time to export the data and start fresh for 3.x – or at least mostly fresh. A recent funding round has gone by, so we may want to keep a few months of data to that point.

In any case, I’m wondering about strategies on how best to go about giving each journal their data and then clearing it out of the existing database.

Thanks in advance!

-Brian

Hi @bricas

I will tag @ajnyga here – he added some indexes on the DB table metrics and changed a few SQLs that significantly reduced the processing time. @ajnyga what indexes did you add? Can you remember if the SQL changes are in the release 3.1.1-4?
We are thinking about a way to compress the old statistics, e.g. to aggregate them somehow, e.g. to calculate just the total numbers for the older years… but maybe we would first like to see if and how the indexes help, because other users would like to keep the statistics non-aggregated…
If you still consider removing them, I would suggest that you then first backup everything and keep that backup, as well as to create all possible reports for those old years, per journal and year. Then give the reports to the journal managers/editors and also keep the backup of them.

Best,
Bozana

Hi,

I think the indexes I added were connected to pulling out metrics data. I have not done that in production but noticed that they also reduced the size of the metrics table. Unfortunately I wrote about it in Slack and do not have access to threads that old.

I believe that the slow upgrade process is connected to the use of subqueries in the upgrade sql’s. These are not a problem with moderately sized tables, but become very slow with large db’s. One solution is to change these to functions (like here pkp/pkp-lib#4131 use function to update 531 assoc_id's in metrics table by ajnyga · Pull Request #2131 · pkp/ojs · GitHub) or to add indexes I guess. With the latter it could be that they also make the upgrade slower, depends on the index I think.

@bricas make sure that you have this pr pkp/pkp-lib#4131 use function to update 531 assoc_id's in metrics table by ajnyga · Pull Request #2131 · pkp/ojs · GitHub before you upgrade. With that change an upgrade process that took around 34 was reduced to 9 hours. It basically changes a single sql query to a function. That single sql took about 70% of the total upgrade time in some larger OJS2 installations I upgraded recently.