Is it safe to move from dispatch folder to stage when there is no records in jobs table and those logs in dispatch are not being processed?

Describe the issue or problem

There are 180 log files in dispatch folder. The configuration is that they should be processed on web request. Yesterday this was workign but after some time - this stopped working with those files left.

No records in jobs table.

Steps I took leading up to the issue

  1. Discussing with AI, it says to move them from dispatch to stage.
  2. Checked - it then files were processed.
  3. Tried to look in docs. But cannot find the same t hing AI suggested. I see “If you see log files in dispatch but not in archive, the scheduled task is running but the jobs are not processing — skip to step 4.” https://docs.pkp.sfu.ca/admin-guide/en/statistics
  4. Checked failed_jobs table - no failed jobs. I guess it is same thing as if I would do from administration menu

What application are you using?
OJS 3.5.0.3

So the question is - is it safe to move from dispatch to stage and let on new request for them to be reprocessed? Will it not corrupt data?

Hi @dvtech

What is your job configuration, from the config.inc.php?
Do you collect daily or only monthly statistics?
If you do not see any entries in your DB table jobs and failed_jobs, then please reprocess the files month by month. See this documentation how to do it: https://docs.pkp.sfu.ca/admin-guide/en/statistics#reprocess-log-files. Take first the oldest month – move all the log files for that month to the stage folder and run that CLI script.
Then check what is happening, if the files are again copied to the dispatch folder and if the job exists in the jobs table. It is a job chain, and on a request – if configured so – the job chain will be run. For more details about log file processing, please see https://docs.pkp.sfu.ca/dev/documentation/en/statistics#how-stats-are-compiled.
You can also check the metrics_* DB tables, e.g. metrics_submission, if there are entries for those log files.

Let us know what you have find out…

Best,
Bozana

Hi @bozana ,

What is your job configuration, from the config.inc.php?

default_connection = “database”

default_queue = “queue”

job_runner = On

job_runner_max_jobs = 2

job_runner_max_execution_time = 3000

job_runner_max_memory = 880

delete_failed_jobs_after = 10

Do you collect daily or only monthly statistics?

daily

run that CLI script

I have no cli access. What I am doing now in testing environment is changing so it would run more frequently instead of daily in classes/scheduler/Scheduler.php

$usageEvent = $this // VU: captured into a variable for the frequency override below
            ->schedule
            ->call(fn () => (new UsageStatsLoader([]))->execute())
            ->daily()
            ->name(UsageStatsLoader::class)
            ->withoutOverlapping();
        VUScheduleConfig::applyUsageStatsLoaderFrequency($usageEvent); // VU
public static function applyUsageStatsLoaderFrequency(Event $event): void
    {
        $everyMinutes = (int) Config::getVar('schedule', self::USAGE_STATS_LOADER_EVERY_MINUTES_KEY, 0);
        if ($everyMinutes >= 1 && $everyMinutes <= 59) {
            $event->cron('*/' . $everyMinutes . ' * * * *');
        }
        VULogger::log(
            'VU: schedule frequency applied. key=' . self::USAGE_STATS_LOADER_EVERY_MINUTES_KEY
            . ' expression=' . $event->expression
        );
    }

but in the docs I see

Note: This requirement only applies if the site is configured to keep monthly statistics only. If daily statistics are kept, you can reprocess individual days without needing all log files for the month.

so do I really need to process whole month logs instead of just what is not processed?

Overall this is working at least in testing looks like, I was just not sure if it is working correctly, AI just suggested to move to stage, I did that, just wanted to verify AI is right.

I actually I mostly tested with hardcoded function call like everyTwoMinutes() but recently changed to take it from config this way. Only problem that I am modifying source files but hmm, with cli access I would not need that, so maybe thats a reason for asking CLI access. Unless you know better way without CLI access.

Hi @dvtech

In that case you do not need to process them month by month. Theoretically you could copy them all into the stage folder. I am just not sure if this could lead to a problem for the job chain that is dispatched – if that all fits into that DB column in the table jobs.
Maybe to try to reprocess just a few first and see if it all works as it should. Also very important: first when one batch of files are fully finished (all the jobs in the chain finished), start reprocessing i.e. move the other log files to the stage folder! – else the temporary usage stats DB table will have too much entries and it could come to a timeout.

Best,
Bozana