Reprocessing usagestats in OJS3.0.2

Hi @bozana,

I am planning to reprocess our usageStats because we have not had the GeoLite file available in our usageStats plugin.

I undestand that I only need to move the archived usage log files to the other folder, but just to make sure I do not make any mistakes a couple of questions:

  1. Should I erase data from the metrics table, or does the old data get overwritten? Is there a risk the we get double entries in the metrics table, I mean two rows for the same hit?
  2. When I copy the files to the other folder, I should probably remove them from the archive folder, right?
  3. Is there a time limit for a acron task? I mean, it could take a long time to go through all the log files.
  4. Can I reprocess just part of the log files, for example just the year 2016?
  5. We started using https recently, any change that this will cause problems while reprocessing the logs?
  6. Anything else I need to do besides copying the geolite file to the plugin folder and moving the log files?

Hi @ajnyga

  1. Actually you do not have to delete the data from the metrics table – the entries with the same log file name i.e. load_id column in the DB table metrics will be overwritten. When processing a file, the function _loadData is called (s. pkp-lib/PKPUsageStatsLoader.inc.php at ojs-stable-3_0_2 · pkp/pkp-lib · GitHub) that further calls $metricsDao->purgeLoadBatch($loadId) that removes all entries with that load_id. Thus, maybe double check that the file names are identical with the load_ids.
  2. Yes, remove them. But to be on the safe side, please make however backup of everything!!!
  3. The time limit is defined by your server, in the php.ini. Thus maybe to do it step by step or with a cron job where the time limit is not so strict?
  4. Yes, because of number 1 above, I think you could only reprocess just part of the log files.
  5. Hmmm… Here I am not sure :-\ I think only your base URL configuration in the config.inc.php has to correspond to the logged URLs in the log files.
  6. Are your plugin settings as wished, i.e. city and region? The GeoLite file goes to the usage statistics plugin folder in the pkp-lib, I think. Else, I think that’s it :slight_smile:

It would be best if you could test it all first on a test installation, but of course this would mean more work :-\

Best!
Bozana

I’m a bit worried with this one, because we also changed our domain name in January.

So the journals basically have three types of addresses in the log files:

  1. original url’s like http://ojs.olddomain.fi/index.php/journalpath
  2. after january http://newdomain.fi/journalpath
  3. after May https://newdomain.fi/journalpath

So, is there a possibility that the log files from 2016 will not be considered because they have the old url’s? Any way to solve you can think of? :smiley:

Hmmm… Maybe to do it partly and always adapt the config URLs. That would mean that you would either have to stop using the current production system during the processing or you could do it somewhere else (with the DB dump and files copy from that production system) and then somehow transport the results into the real DB table metrics? Hmmm…

thanks, I will try it locally. One last question, did the metrics table change in the OJS2=>OJS3 update?

Yes, that is a good plan! :slight_smile:
Yes, the table changed…

I think it should be in the plugins/generic/usageStats folder, not under pkp-lib?