There was a stats problem in ojs 220.127.116.11 where the total Galley Views would always be 0. We have since upgraded to 18.104.22.168 and the Galley Views are now working. However I want to process the old log files to try to get some of the galley view stats. My understanding is that we have to use the apache access logs to do this, as the usage_event logs will not work.
I have moved the access logs into the usageStats/stage folder
then I modify the DB scheduled_tasks.last_run value for the UsageStatsLoader to be older then 24 hours ago.
Then I run
php tools/runScheduledTasks.php plugins/generic/usageStats/scheduledTasksExternalLogFiles.xml
However the stats are not processed. The access logs do get deleted from the stage folder.
I’m fairly sure the access logs are in the correct format:
22.214.171.124 - - [12/Sep/2016:04:42:29 -0600] “GET /index.php/ewjus/article/view/227/95 HTTP/1.1” 200 17026
126.96.36.199 - - [12/Sep/2016:05:05:30 -0600] “GET /index.php/ewjus/article/download/218/86 HTTP/1.1” 200 236740
Thanks in the scheduledTaskLogs folder i get: The line number 1 from the file /journals-data/uploads/www.ewjus.com/usageStats/processing/www.ewjus.com-access_log-20160814 is not a valid log entry and the file was rejected.
Your Apache logs are not (or at least the first line of that log is not) in the standard “Combined” format. Do the rest of your lines in the file look similar? Your Apache LogFormat is probably something like: LogFormat "%h %l %u %t "%r" %>s %b"
You should be able to construct a regular expression which will work with your existing logformat, with the exception that it doesn’t appear you are capturing the userAgent string in your logs. This information will be lost. A functional regex should be something like: /^(?P<ip>\S+) \S+ \S+ \[(?P<date>.*?)\] "\S+ (?P<url>\S+).*?" (?P<returnCode>\S+) \S+(?P<userAgent> *?)/