I am planning to reprocess our usageStats because we have not had the GeoLite file available in our usageStats plugin.
I undestand that I only need to move the archived usage log files to the other folder, but just to make sure I do not make any mistakes a couple of questions:
Should I erase data from the metrics table, or does the old data get overwritten? Is there a risk the we get double entries in the metrics table, I mean two rows for the same hit?
When I copy the files to the other folder, I should probably remove them from the archive folder, right?
Is there a time limit for a acron task? I mean, it could take a long time to go through all the log files.
Can I reprocess just part of the log files, for example just the year 2016?
We started using https recently, any change that this will cause problems while reprocessing the logs?
Anything else I need to do besides copying the geolite file to the plugin folder and moving the log files?
Actually you do not have to delete the data from the metrics table – the entries with the same log file name i.e. load_id column in the DB table metrics will be overwritten. When processing a file, the function _loadData is called (s. pkp-lib/PKPUsageStatsLoader.inc.php at ojs-stable-3_0_2 · pkp/pkp-lib · GitHub) that further calls $metricsDao->purgeLoadBatch($loadId) that removes all entries with that load_id. Thus, maybe double check that the file names are identical with the load_ids.
Yes, remove them. But to be on the safe side, please make however backup of everything!!!
The time limit is defined by your server, in the php.ini. Thus maybe to do it step by step or with a cron job where the time limit is not so strict?
Yes, because of number 1 above, I think you could only reprocess just part of the log files.
Hmmm… Here I am not sure :-\ I think only your base URL configuration in the config.inc.php has to correspond to the logged URLs in the log files.
Are your plugin settings as wished, i.e. city and region? The GeoLite file goes to the usage statistics plugin folder in the pkp-lib, I think. Else, I think that’s it
It would be best if you could test it all first on a test installation, but of course this would mean more work :-\
Hmmm… Maybe to do it partly and always adapt the config URLs. That would mean that you would either have to stop using the current production system during the processing or you could do it somewhere else (with the DB dump and files copy from that production system) and then somehow transport the results into the real DB table metrics? Hmmm…