Unable to properly import external (combinded apache) log files

Hi to all,

Following the PKP_Statistics_Framework i put some test clips from my apache log file containing downloads to the stage directory and run

php tools/runScheduledTasks.php plugins/generic/usageStats/scheduledTasksExternalLogFiles.xml

The process is finished without error (file moved to archived/nothing in the rejected folder) but the metrics table does not contain the necessary data!

Log file is in combined format
I’m using ojs 2.4.8-1

Example from apache log file used as input:

AAA.BBB.CCC.DDD - - [10/Oct/2016:10:54:12 +0300] “GET /JOURNALNAME/article/download/191/195 HTTP/1.1” 200 9724643 “http://X.Y.Z/JOURNALNAME/article/view/191/195” “Mozilla/5.0 (Windows NT 6.1; WOW64; rv:48.0) Gecko/20100101 Firefox/48.0”

Any ideas what might be wrong?

Thanks in advance,
Theodoros Theodoropoulos

My site is not a fresh installation. It was upgraded several times, but now has the latest stable version.

I have a fear that the RegEx value currently found in JOURNALNAME/manager/plugin/generic/usagestatsplugin/settings

/^(\S+) \S+ \S+ \[(.*?)\] “(\S+).*?” \d+ \d+ “(.*?)” “(.*?)”/

is different from the one found in /plugins/generic/usageStats/settings.xml and also the (new) code in /generic/usageStats/UsageStatsLoader.inc.php and should(?) be manually changed because it has somewhat different grouping.

The first produces these groups:

Group2: 10/Oct/2016:10:54:12 +0300
Group3: GET <— !!
Group4: http://X.Y.Z/hydrotechnica/article/view/191/195
Group5: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:48.0) Gecko/20100101 Firefox/48.0

The latter produces these groups:

Group date: 10/Oct/2016:10:54:12 +0300
Group url: /JOURNALNAME/article/download/191/195
Group returnCode: 200
Group userAgent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:48.0) Gecko/20100101 Firefox/48.0

According to code in UsageStatsLoader.inc.php (lines 330->334) Group3 should point to URL and Group4 to ReturnCode

Could it be that this prevents data from being merged to the DB?

[UPDATE] YES, this seems to be the problem!
Removing completely the RegEx in JOURNALNAME/manager/plugin/generic/usagestatsplugin/settings, forces /generic/usageStats/UsageStatsLoader.inc.php to use the default value with named groups, that seem to work!!

Please verify with your installation… If true, this should lead to a bug fix

There was a brief period of time when a bad regular expression was introduced in the product. Your analysis above is correct. The regular expression was corrected, but if you saved the Usage Statistics settings when the faulty regular expression was present, it would be saved in your plugin settings until removed or corrected, as you have done.