Working with large metrics tables

Hi,

Tagging @bozana, @asmecher and @jmacgreg (for hosting experiences).

Our installation has around 70 journals now and about 20 of those have used the same OJS installation since 2008.

Our metrics table has 4,578,093 rows (almost 600 megabytes) and the number is rising fairly quickly now that we have more journals. This has been a bit of a headache with recent upgrades where single sql queries affecting the metrics table have taken around 12-16 hours to complete.

But the bigger problem is that the table is getting slower to read. Especially queries that try to find things like most read articles (for example tsvSite/TsvSiteThemePlugin.inc.php at master · ajnyga/tsvSite · GitHub) are taking several seconds to complete and probably the problem will be bigger after a couple of years.

We could of course ask journals to save their old statistics and delete for example rows older than three years from the table. But this would mean that download counters for single articles would be affected. Some journals like to show the whole history of article downloads for the public.

So what I am thinking is should there be a way of moving metrics data to an “archive” of a sort? I mean that you could combine for example all downloads within a month to a single row for that month. Like packing the data. This would of course mean that geographical data and daily data would be lost, but it would basically turn around 30 rows (or more) into a single row and would make the metrics table a lot smaller in the process, and faster to read.

Or are there other ways of making the metrics table perform faster? For example with the query I linked above.

1 Like

Also very interested in this, as one of our instances at Scholars Portal has a metrics table that’s 949MB/6,124,089 rows and is very slow for metrics queries.

Dear community and PKP,

We’ve run into the same problem IN OUR OJS2 INSTALLATION (my OJS3 test installation shows a similar behavior, I guess it is the very same problem) .

The metrics piled up for our journal (www.jstatsoft.org). Number of rows ca. 3 million, but given the high number of user visits the server started struggling (memory and CPU load). Generation of one page ~6 seconds today.

The reason seems to be this query:

SELECT SUM(metric) AS metric FROM metrics WHERE assoc_id = '1983' AND assoc_type = 257 AND context_id = '1' AND metric_type = 'ojs::counter' LIMIT 5000;

The Problem

No explicit key for the combination of these four variables.

The Solution

My fix was to add a fourth key to the table for this specific query (see above).

ALTER TABLE metrics ADD KEY metrics_assoc_id_assoc_type_context_id_metric_type (assoc_id , assoc_type , context_id , metric_type);

The Result

Helped to cut down the query time from ~6 seconds to about 0.05 seconds in my test-set and gives our system room to breath :). As mentioned I encountered a very similar behaviour on our OJS3 test-installation, need to have a look whether it is the very same problem. If so, I would consider to include this additional index/key in default OJS3 installations?

Tags: @bozana, @asmecher and @jmacgreg

All best,
Reto

1 Like

Hi @retostauffer,

As luck would have it, that’s already been added, starting with OJS 3.1.1! See Add index for metrics fetch · pkp/pkp-lib@7bda7c4 · GitHub for commit details.

Regards,
Alec Smecher
Public Knowledge Project Team

1 Like

Hy @asmecher

Awesome! Thanks for the quick replay and sorry that I haven’t had time to check this. Would have been obvious that it is already fixed already, my bad.

All best,
Reto