Reducing DB size / archiving metrics

Hey All,

As we’re approaching our migration to OJS 3, we’re taking stock of our OJS 2 installation. Our test migration took a significant amount of time (as I’ve mentioned in another thread) – likely due to the sheer amount of data we have across our multi-journal install.

We have two particularly problematic tables:

+---------------------------------------+------------+
| Table                                 | Size in MB |
+---------------------------------------+------------+
| article_search_object_keywords        | 2948.01    |
| metrics                               | 2608.00    |
+---------------------------------------+------------+

I see there’s code to clear the index which would clean out the article_search_object_keywords table. Would doing that prior to migration help? Would clearing it now be advisable?

We also have metrics going back to 2009. Data beyond a few years is no longer actively used and it would be nice to mothball that set of data. Would a simple SQL dump be sufficient?

Thanks in advance!

Hi @bricas,

on the article_search_object_keywords table, it’s safe to run the following SQL:

DELETE FROM article_search_keyword_list;
DELETE FROM article_search_objects;
DELETE FROM article_search_object_keywords;

…before running your OJS 2.x to 3.x upgrade, then re-generate the index by running the following command-line script:

php tools/rebuildSearchIndex.php

This will speed up the upgrade script, though of course it’ll take a while to rebuild the index afterward.

Tagging @bozana, who can comment on the metrics!

Regards,
Alec Smecher
Public Knowledge Project Team

Hi @bricas

Just to be sure: do you want to remove the old data from the DB table metrics because you do not need them any more? Or?

Best,
Bozana

Correct – though we would like to keep a copy of the data in the extreme case that we would want to have a look at it.

Hi @bricas

Because that table contains references to the objects in other tables, it would be good to keep the whole OJS DB dump, but maybe also the full OJS backup. That would be the minimum, I think.
But, because some references (e.g. for supp files) will change with the upgrade to OJS 3: would it be possible for you to use the whole DB table metrics for the upgrade to OJS 3 on a test installation? If so, maybe to backup that as well. That way you would have the correct/actual references (how they are in your new OJS 3) for those old entries as well.
If you keep your usage statistics log files, you could also re-process the old log files again – if you would need to do that for some reason – although I am not sure if the OJS 3 log file processing considers the old OJS 2 URLs e.g. for supp files – I would need to check that…

Best,
Bozana