OJS 3.3: what's the best approach to uploading a huge number of new galley files for already published submissions?

tmrozewski · September 27, 2024, 4:07pm

Hello, one of our journals was born print and migrated to OJS with scans of their back file. Those back files were recently re-scanned in higher quality and with better OCR and the journal would like to add these to the existing, published submissions. The editor estimates up to 1,000 PDFs. We’re currently on 3.3.0.19 and hope to upgrade to 3.5 sometime later next year.

What’s the best approach to this? I’d like to avoid having staff manually version, upload, and republish the submissions (I don’t even think we have the staff hours for that). I don’t believe the API can currently handle this. I don’t need instructions, I’d just like to be pointed in the right direction for a task like this. Thanks!

abadan · September 27, 2024, 9:32pm

Currently our team is doing this via the Quick Submit plugin. And as you can see, it’s quite laborious and time-consuming for such a quantity.

When the metadata is previously identified in another system, it may be better to convert it to the XML accepted by OJS and import it. But we don’t usually have this option.

We are experimenting with how AI can help us when we receive requests for large quantities of articles to be entered into OJS, which has been somewhat common. Maybe in the future there will be less manual work…

asmecher · September 27, 2024, 10:26pm

Hi all,

If I were tackling this, I’d look directly to the filesystem and database and see if it’s feasible to match files there to issues. You’re just updating files with higher-quality forms of the same content, so OJS doesn’t need to know (and working through the process manually or even via the API will be a lot more time consuming).

You’ll need to use the database to match submission IDs and file types to filenames. Look in the submission_files and files tables, which are linked by file_id.

Regards,
Alec Smecher
Public Knowledge Project Team

tmrozewski · October 8, 2024, 6:57pm

Thanks @abadan but that’s not quite the question - we’re not trying to ingest submissions for the first time, we’re trying to replace the galley files for already existing and published submissions in bulk.

tmrozewski · October 8, 2024, 7:01pm

Thanks @asmecher I’ll investigate doing this via the database at some point.

I investigated using the REST API at the beginning of the year and my memory is fuzzy but IIRC it either wasn’t possible to do this according to the documentation or there was a problem with finding the appropriate database identifiers to make this happen. If I revisit this, I’ll try to update with my findings so that at least they’re documented and searchable in the forum.