Hi @huc-di-infra and others,
Yes, the statistics jobs are ‘complicated’. Statistics jobs HAVE TO be executed one after another, and not in parallel. This is ensured using a chain. Else the stats can be wrong, or temporary tables grow too much. Also, if several log files reprocessing (via UsageStatsLoader) are started while the previous one is not finished yet, the temporary tables grow too big and the processing is almost impossible. Temporary tables should only contain entries from one log file. Else, it is too heavy processing, that could lead to failures. Thus, if a file processing fails, it is sometimes only possible to remove the entries from the temporary tables and restart the log files processing again (with UsageStatsLoader) – not to restart/retry the jobs.
So, I think, the log file processing and the statistics jobs running should be monitored regularly.
We also find out that even that increase of timeout to 600 seconds can be not enough for huge installations and log files, for the job that reads the entries form the log file and saves them in to the temporary tables. Thus, we will work on improving that, but I am not sure when, because we have so much other high priorities right now.
Eventually, if interested, see this information how processing of the usage stats log files work: Daily usage events log not loading into db - #4 by bozana.
Best,
Bozana