The statistics of visits to the abstract are not recorded

sergiobm · September 24, 2024, 2:58pm

In Sandbox, I have 215 results for both queries. So, after the upgrade, three more were added.

sergiobm · September 24, 2024, 2:58pm

1 - I got 218 results
2 - I also got 218 results

(Live environment)

Gustavo_Leon · September 24, 2024, 6:46pm

Hi @bozana I apologize for the slowness of my responses I have a different time zone.

The files do work fine both in the forntend and in the OJS panel, I also checked if any logs are processed correctly after the update but there is nothing all are going to rejected.

The upgrade process, if it was performed to the latest version of 2.4.8 and then to 3.2.1 and finally to 3.3.0.17, in each upgrade I make sure that everything works correctly, however I did not verify that the statistics worked in 3.2 because I did not want to have my site offline for a long time.

The files that are showing errors are indeed galleys, they can be the text of the article or complementary files but they are always galleys, and they are fully functional, that is what I find very strange.

My system is already online for about 1 month, and that is where the visits are recorded only downloads, generate a test environment to diagnose the error.

I tried the querys that you kindly provided and I get 7917 results, it is normal since my OJS is multijournal and has more than 40 journals.

I am not sure if it would work or is the right way to solve it, but I understand that the assoc_id refers to the galley id (the representation object), and the asocc_type refers to the type of file it is, article text, complementary, etc. and it must also have the file status in 10, if we can correct this inconsistency the file will be processed correctly, right?

Yesterday I modified a log only with visits and it was processed correctly or at least that is what the log said, however the statistics of visits were not processed, maybe I chose the correct entries to make that test.

bozana · September 25, 2024, 12:51pm

Hi @Gustavo_Leon,

I am not sure if it would work or is the right way to solve it, but I understand that the assoc_id refers to the galley id (the representation object), and the asocc_type refers to the type of file it is, article text, complementary, etc. and it must also have the file status in 10, if we can correct this inconsistency the file will be processed correctly, right?

Yes, you are right!

Yesterday I modified a log only with visits and it was processed correctly or at least that is what the log said, however the statistics of visits were not processed, maybe I chose the correct entries to make that test.

What do you mean with “statistics of visits were not processed”? Do you not have the entries for those visits in your DB table metrics?’

I hoped that assoc_id are not missing for you both, but it seems they are missing.
I will thing about an SQL query that could help you solve that problem – enter the missing assoc_id and assoc_type values…

Best,
Bozana

sergiobm · September 26, 2024, 5:03pm

Hi @bozana,

For your information, just one usage_events file (Sep 07th) has been processed since September 04th. All the remaining, including September 04th, are in the stage folder now. They are being moved to the Processing folder and returned to Stage daily.
In the scheduledTasksLogs folder, I’m seeing in the UsageStatesLoader files with this information:

[2024-09-25 19:38:50] [Aviso] Iniciada tarefa agendada.
[2024-09-25 19:38:50] [Erro] O diretório …/files/usageStats/processing não está vazio. Isto poderia indicar que um processo falhou anteriormente, ou um processo está executando atualmente. Este arquivo será automaticamente reprocessado se você estiver usando oscheduledTasksAutoStage.xml, caso contrário, você precisará mover manualmente os arquivos órfãos no diretório de processamento de volta para o diretório principal.

I hope your script will fix that. Thanks

sergiobm · September 30, 2024, 6:04pm

Hi @bozana, any feedback on this thread? Thanks!

bozana · October 1, 2024, 12:53pm

Hi @sergiobm and @Gustavo_Leon,

here are the SQLs that you can execute to fix the missing assoc_type and assoc_id in your DB table submission_files. Those are SQLs for MySQL database. If you have PostgreSQL database the SQLs should be slightly different.
Please test the SQLs first on a test installation/server!!!
After executing both SQLs you could try to (re)process one log file that did not work earlier.

Here the SQLs:

UPDATE submission_files sf
INNER JOIN publication_galleys pg ON pg.submission_file_id = sf.submission_file_id
SET sf.assoc_type = 521, sf.assoc_id = pg.galley_id
WHERE sf.file_stage = 10 AND assoc_type IS NULL AND assoc_id IS NULL

This first SQL will fix the rows that have NULL in both columns assoc_type and assoc_id.

UPDATE submission_files sf
INNER JOIN publication_galleys pg ON pg.submission_file_id = sf.submission_file_id AND pg.galley_id = sf.assoc_id
SET sf.assoc_type = 521
WHERE sf.file_stage = 10 AND assoc_type IS NULL AND assoc_id IS NOT NULL

This second SQL will fix the rows where only assoc_type is missing but there is assoc_id.

Best,
Bozana

sergiobm · October 1, 2024, 4:29pm

bozana:

UPDATE submission_files sf
INNER JOIN publication_galleys pg ON pg.submission_file_id = sf.submission_file_id
SET sf.assoc_type = 521, sf.assoc_id = pg.galley_id
WHERE sf.file_stage = 10 AND assoc_type IS NULL AND assoc_id IS NULL

Thanks. I’ll try on our Sandbox. To reprocess a log file, must we move it to the processing folder?

sergiobm · October 1, 2024, 4:31pm

Hi @bozana, I ran the SQL in my Sandbox:

1 - select sf.submission_file_id, sf.file_stage, sf.assoc_type, sf.assoc_id from submission_files sf left join publication_galleys pg on (pg.galley_id = sf.assoc_id) where sf.file_stage = 10 AND assoc_type IS NULL AND pg.galley_id IS NULL

This query returned 215 rows.

2- UPDATE submission_files sf
INNER JOIN publication_galleys pg ON pg.submission_file_id = sf.submission_file_id
SET sf.assoc_type = 521, sf.assoc_id = pg.galley_id
WHERE sf.file_stage = 10 AND assoc_type IS NULL AND assoc_id IS NULL

This query updated 93 rows (122 rows were unchanged)

3 - UPDATE submission_files sf
INNER JOIN publication_galleys pg ON pg.submission_file_id = sf.submission_file_id AND pg.galley_id = sf.assoc_id
SET sf.assoc_type = 521
WHERE sf.file_stage = 10 AND assoc_type IS NULL AND assoc_id IS NOT NULL

This query ran successfully but did not change any row.

bozana · October 1, 2024, 5:05pm

Hi @sergiobm

What do you see when you execute this SQL:

SELECT sf.submission_file_id, pg.galley_id, sf.submission_id FROM `submission_files` sf
LEFT JOIN publication_galleys pg ON pg.submission_file_id = sf.submission_file_id
WHERE sf.file_stage = 10 AND assoc_id IS NULL AND assoc_type IS NULL

sergiobm · October 1, 2024, 5:43pm

There are 122 rows without galley_id

bozana · October 2, 2024, 10:54am

@sergiobm, could you investigate those submission and files in the editorial backend?
I do not know how this could happen…

sergiobm · October 2, 2024, 4:19pm

Hi @bozana, how can I quickly identify which journal ID would belong to each submission? My portal has more than 40 journals inside. I’m assuming the submission_id is the one showing in the URL (it seems to me that submission_file_id is like an internal number).

Gustavo_Leon · October 4, 2024, 1:27am

Hi @bozana

I used the query provided, in my case there were more than 6 records that were updated.

UPDATE submission_files sf
INNER JOIN publication_galleys pg ON pg.submission_file_id = sf.submission_file_id
SET sf.assoc_type = 521, sf.assoc_id = pg.galley_id
WHERE sf.file_stage = 10 AND assoc_type IS NULL AND assoc_id IS NULL

The second query did not find matches, however I decided to reprocess some files and they were reprocessed correctly and the views also show data

However, when using this query I also have some records, I have checked and the submissions and files are fully functional.

SELECT sf.submission_file_id, pg.galley_id, sf.submission_id FROM submission_files sf
LEFT JOIN publication_galleys pg ON pg.submission_file_id = sf.submission_file_id
WHERE sf.file_stage = 10 AND assoc_id IS NULL AND assoc_type IS NULL

sergiobm · October 9, 2024, 10:35pm

Hi @bozana, any tip to identify the information? Thanks

sergiobm · October 10, 2024, 3:35pm

Hi @bozana, do I need to clean up or update the last run for UsageStatsLoader from DB table scheduled_tasks to force the metrics to rerun since the files are stuck in the stage folder? I ran the SQL you shared, but the abstract metrics are not being updated yet. Thanks

sergiobm · October 11, 2024, 12:20pm

After applying the updates to the database, I moved the file that was stuck in the processing folder to the stage. All the files in the stage folder were processed at night, and finally, the abstract metrics were updated. Thanks a lot for the help

bozana · October 14, 2024, 10:12am

Hi @sergiobm,

Glad to hear that!

And if you would like to investigate the missing galley IDs here is the SQL that would also show the journal path, so that you know to which journal the submission is belonging to:

SELECT sf.submission_file_id, pg.galley_id, sf.submission_id, j.path 
FROM `submission_files` sf
LEFT JOIN publication_galleys pg ON pg.submission_file_id = sf.submission_file_id
LEFT JOIN submissions s ON s.submission_id = sf.submission_id
LEFT JOIN journals j ON s.context_id = j.journal_id
WHERE sf.file_stage = 10 AND assoc_id IS NULL AND assoc_type IS NULL

Best,
Bozana