Usage Stats Loader - Multiple tasks running

Hi,

I’m using postgre and we have a lot of requisitions. Something like 3 or 4 per second.
Everyday I see between 2 and 4 connections in my database processing usage data.

It’s ok? I guess just one must be running.

In days with 4 connections I receive 3 e-mails with failure.
In days with 3 connections I receive 2 e-mails with failure.
Aways one less.
But inside database in metrics table the file is processed.

These multiple tasks can be altering the usage information? Multiplying the actual amount? Or is it just a computational processing waste?

I am afraid that we are multiplying use.

Tarcisio Pereira.

Tarcisio, there is no duplication. When the system cant move a file in stats processing it ends with a fatal error. See FileLoader class, line 305. I think that’s where your error is coming from.

Can you also paste here the emails you have with errors? Thanks.

Just this:
##admin.scheduledTask.downloadLog##

And inside the log file:
[2015-04-28 22:21:09] [Notificação] Tarefa iniciada.
[2015-04-29 04:18:16] [Erro] O arquivo usage_events_20150417.log não pode ser movido de /data/files-ojs/usageStats/processing/usage_events_20150417.log para /data/files-ojs/usageStats/archive/usage_events_20150417.log
root@rossiyamatushka:/data/files-ojs/scheduledTaskLogs#

The error happens when it try to move to archive. And just a lot of time after it starts.

It’s makes me believe that multiple tasks are accessing the same file and multiplying the amount of usage.

If I see 3 tasks, after two errors the file is put inside the archive folder.

Almost everyday.

Thank you for your time.

Ok, this is something different. You have 3 emails with the same filename right? I will investigate, but even if there are multiple processing tasks for the same file, they will be merged or not counted, because they use the same load id. I get back to you when I have something more accurate. Thanks.

Bruno.
I forgot to say.

I have one log file per task.
Starts 2 or 3 everyday, so… 2 or 3 files.

Just one file is ok, the others show error and stop.
But It’s happens a lot of time after start.

Tarcisio, even if there’s more than one task processing the same file at the same time, there will be no stats duplication because the processing already excludes requests from the same user that are too close. So your situation now it’s not ideal, but the data you have in your database is correct.

This is happening because postgres seems to not support the function we are using to avoid those multiple requests. So, while we don’t come up with a solution that works for postgres, you can disable the acron plugin and you can create cron jobs in your server to execute the tasks you want to, just like the old times before acron plugin.

The stats loader task can be executed by this line:

php tools/runScheduledTasks.php plugins/generic/usageStats/scheduledTasksAutoStage.xml

If you create a cron job that executes this line every day, you will process the stats without this multiple processing problem.

Thanks,
Bruno

Bruno,

Thank you too much.
I will keep it as it is. Double processing is not a problem if it don’t double the usage.

Tarcisio Pereira.

Bruno,

Excuse If I seem a repeater, but are you sure that there really is no possibility of multiplication of access?
I am reprocessing the files and the total usage is decreasing out instead of increasing.

I’m suspicious something that is wrong is not right.
Confused

Tarcisio Pereira.

Hi Tarcisio,

Reprocessing should give you the exact same numbers, not increasing or decreasing. How are you taking those numbers to compare?

Thanks,
Bruno

Hi Bruno,

I’m reprocessing 10 files, its take 3 days to processing in my server.
After processing the 10 files + the 3 news, the total is lower then before with just 10 files.

I’m talking about almost 1.000.000 of diference.

Grateful,

Tarcisio Pereira.

Tarcisio,

What is that you call total? Total metrics table entries? Total stats for an object?

Bruno

Bruno,

Both. I select the sum of all metric column with date in 2015.
When I opened this topic was almost 5,000,000.
Now is something like 4,000,000.

I reprocessed all log files with date after migration to postgresql.

Tarcisio Pereira.

Tarcisio,

Thanks for your help on this. I will setup a postgresql environment and test this specific case. I could bet it would not duplicate the stats, but your experience is telling something different. As soon as I have news on this I’ll tell you.

Thanks again,
Bruno

Bruno, I who thank you for your effort.
All of you do almost miracles while I help so little.
It may be hard to reproduce, because we have 4-5 requests per second.
I believe that is multiplying and confusing because all tasks use the same temporary table with the same load_id.

I was thinking of using lockers so that only one request at a time the access method that starts processing, but there would accumulate many requests, thus compromising the performance.
I thought of using a global variable to indicate that is already being processed and not start again, but it is like a palliative solution rather than definitive.

Perhaps the best solution is to use the acron for all other tasks and schedule this execution in cron.

I honestly do not know yet …

Tarcisio Pereira.

I often receive the same email;

[2015-05-01 15:24:04] [Notice] Task process started.

[2015-05-01 15:24:04] [Notice] File
/home/escijou1/scirev/usageStats/processing/usage_events_20150420.log was
processed and archived.

[2015-05-01 15:24:04] [Notice] File
/home/escijou1/scirev/usageStats/processing/usage_events_20150417.log was
processed and archived.

[2015-05-01 15:24:04] [Notice] File
/home/escijou1/scirev/usageStats/processing/usage_events_20150423.log was
processed and archived.

[2015-05-01 15:24:04] [Notice] File
/home/escijou1/scirev/usageStats/processing/usage_events_20150424.log was
processed and archived.

[2015-05-01 15:24:04] [Notice] File
/home/escijou1/scirev/usageStats/processing/usage_events_20150428.log was
processed and archived.

[2015-05-01 15:24:04] [Notice] File
/home/escijou1/scirev/usageStats/processing/usage_events_20150422.log was
processed and archived.

[2015-05-01 15:24:04] [Notice] File
/home/escijou1/scirev/usageStats/processing/usage_events_20150429.log was
processed and archived.

[2015-05-01 15:24:04] [Notice] File
/home/escijou1/scirev/usageStats/processing/usage_events_20150427.log was
processed and archived.

[2015-05-01 15:24:04] [Notice] File
/home/escijou1/scirev/usageStats/processing/usage_events_20150430.log was
processed and archived.

[2015-05-01 15:24:04] [Notice] File
/home/escijou1/scirev/usageStats/processing/usage_events_20150415.log was
processed and archived.

[2015-05-01 15:24:04] [Notice] File
/home/escijou1/scirev/usageStats/processing/usage_events_20150419.log was
processed and archived.

[2015-05-01 15:24:04] [Notice] File
/home/escijou1/scirev/usageStats/processing/usage_events_20150421.log was
processed and archived.

[2015-05-01 15:24:04] [Notice] File
/home/escijou1/scirev/usageStats/processing/usage_events_20150414.log was
processed and archived.

[2015-05-01 15:24:04] [Notice] File
/home/escijou1/scirev/usageStats/processing/usage_events_20150425.log was
processed and archived.

[2015-05-01 15:24:04] [Notice] Task process stopped.

Hi Bruno!

Any news or ideas?

Tarcisio Pereira

Hi Bruno,
I’m going to use your alternative with cron.
I just need to cron

php /var/www/revistas/tools/runScheduledTasks.php

And

php tools/runScheduledTasks.php plugins/generic/usageStats/scheduledTasksAutoStage.xml

?

Or is there something missing?

@Tarcisio_Pereira,

That’s right. Remember to disable acron plugin ok?

Thanks,
Bruno

Bruno,

Thank you very much!