OJS usage stats and metrics table

Hi,

We’re running 2.4.7.1 and looking at upgrading to 3.1.0-1. I’m trying
to better understand the collection of stats data to determine what needs
to be migrated and what if anything can be left behind.

We currently have 2.1M rows in our ‘metrics’ table, with entries dating
back to 2010. We also have usage_events_YYYYMMDD.log files in our
files/usageStats/archive folder dating back to 2014, specifically to the
day we upgraded to version 2.4.3 when the new statistics stuff was added.

It appears that the metrics table is updated daily from the usage_events
log file via a scheduled task, which then archives the log file.

Questions:

  1. Do I need to keep the archived usage_events log files indefinitely?
    Are they used for anything, or can I back them up and delete them off of
    the server?

  2. Do I need all 2.1M rows in the metric table? Is there a way to delete
    older rows (apart from a manual SQL delete) and what are the implications
    of doing so? Which reports would be affected and how?

  3. Under the Timed Views report, there is a ‘Clear Logs’ option. Which
    data specifically does this remove? Log files or data in the ‘metrics’
    table or something else?

Thanks in advance…

Sandy

1 Like

Good questions, @sandygordon. You are correct in the general workflow of the usage being logged to flies and those being processed into the metrics table.

  1. The archived usage_event logs would only been needed if you ever wanted to rebuild the metrics table from the original logs. Why might you want to do this? One example: Early in the development of the Usage Statistics functionality we found some bugs in the processing which meant that the logs needed to be re-processed to re-generated the corrected statistics. With the statistics being overhauled for 3.x, I would lean strongly to holding onto the original files, just in case a new bug is found. Another example is better discussed in the next point.
  2. The metrics table may contain legacy data that you don’t care about. For example, depending on how long you have been running OJS, you may have different “metric_types” which are deprecated at this point, such as the legacy counter metric type, or old timed views. There also might be facets of the current primary metric type (“ojs::counter”) which are not of interest. In recent versions of OJS 2.4.8.x, under “User Home” → “Journal Manager” → “System Plugins” → “Generic Plugins” → “Usage Statistics” → “Settings”, you’ll see that “City” and “Region” dimensions are optional. Unchecking these and reprocessing your logs many substantially reduce the number of rows in your metrics table, if you have a wide variety of geographical users.
  3. The “Clear Logs” option within the “Timed Views” plugin in 2.4.7 will clear the records with metric type “ojs::timedViews” from the metrics table.

Edit: note also that you may want to consider the duplicate coverage of both the OJS usage statistics logs and the Apache access logs. The OJS metrics table can be rebuilt from either (though this will be more challenging from the Apache access logs).

Hi Clinton,

Thanks for the quick reply. This is all very helpful information. I suspect we will end up keeping all of our metrics data, at least for this upcoming upgrade.

Sandy

Hi @ctgraham,

We are migrating from 2.4.6 to 3.1.1. PLUS moving the journal under a new ojs “site”. Previously it had its own site and one single journal. Now we installed a new site hosting a couple of journals. We have already upgraded and users and articles are also imported to the new journal. Only stats are left. I was wondering what is the best approach to migrate the stats. 1) copy metrics table only from the legacy database and append to the new metrics table OR 2) Moving usageStats files and running runScheduledTasks.php

The problem is that journal_id will be different in the new site. If we take first approach we need to manually update journal_id column before appending.

Thanks in advance.
Ghazal

Update:
I ran a quick test. I copied an old usageStats/archive log file into usagestats/stage and ran:

php tools/runScheduledTasks.php lib/pkp/plugins/generic/usageStats/scheduledTasks.xml

A new record was added to metrics table. However when I generated a report using “View Report” the added stat was not recorded there. It was there in PKP usage statistics report though. View Report is much more important for us. Is there any way to fix this?

Thanks

If you have migrated the articles and issues manually, the ids for the galleys, articles, issues, and journal have probably changed, not just the journal id. This will prevent the usage stats tool from rebuilding the metrics table successfully from the logs with the old ids. It will also mean that if you were to migrate the metrics table directly, you would have to translate the assoc_ids for all of these objects.

The only current methodology for moving form a multi-journal install to a single journal install is to copy the entire multijournal install and then delete every other journal.

The PKP Technical Committee is actively discussing better ways to migrate data from one install to another.

H @ctgraham,

Actually we are moving from a single journal install to a multi-journal signal. Isn’t there still anyway to move the stats? That’s really worrying for us. Or any other way to migrate the articles so that we can use the old log files to regenerate the stats?

Regards
Ghazal

Ah, sorry. I misread the direction of your migration.

Unfortunately, you will need to translate the value of each galley, article, and issue in either the metrics table data or in the historic access logs in order to move the metrics from the single journal install to the multijournal install. That is, if article 101 in the original site is now article 200 in the multijournal site, you would need to look at the metrics.submission_id and metrics.assoc_id (where the metrics.assoc_type is 257), and replace the old 101 value with the new 200 value.