Many files, what to do?

Vitor_R · November 1, 2019, 2:37pm

Dear colleagues,
We are bringing a magazine that was in SciELO, we have the SciELO standard XML and the .pdf
Now we are in a part let’s say critical.
Would you know what is the “most effective” method of upgrading it to OJS?
We have approximately 6,600 .pdf (or articles) to bring to OJS,
Would you have any method of deploying such a large amount of files?
I say for example using a script that sends the files and / or metadata through the installation root folder.
I really am lost on this issue.
Regards,
Vitor

ctgraham · November 1, 2019, 2:49pm

When I have done this in the past, my approach has been to copy the PDFs to a temporary location on the server, then transform the XML to OJS native import XML, with the submission_files elements pointing to the local PDF copy by name. Breaking the import down by issue and testing it on a development server beforehand has been helpful.

Vitor_R · November 4, 2019, 12:53pm

Thanks for the reply.

As I understand it, turning XML into OJS Native XML would it be if I had already an OJS standard XML?
Because the “straightforward” conversion from SciELO’s XMl model to the OJS model gives you several errors.
I tried to look through various websites for a “simple” conversion method but only found one that is
in process of creation yet.
I really don’t have time to upload more than 6000 pdf’s in uploads one by one.
I got to see this video. (import articles and issues in Open Journal Systems (OJS) via XML - YouTube)
And as I understand, maybe by editing the fields that were edited I can upload my files?
Excuse me if it sounds a bit confusing, I’m just starting in the CC area.
Regards,
Vitor

ctgraham · November 4, 2019, 5:10pm

If you have XML output from SciELO, and if you have someone with skills with XSLT, you can transform the SciELO XML into OJS XML. You can group the articles/issues into a set which is manageable.

With this done, you can upload all of the PDFs and the XMLs to the server via sFTP (or FTP or Samba or similar).

Then, from a command line you can run the importer on each grouping of XML/PDFs with the command line tool. From your OJS install directory, run php tools/importExport.php NativeImportExportPlugin usage for more details.

twa · November 6, 2019, 8:50am

@Vitor_R We’re still in the process of importing our back issues till 2002, which contain roughly 1000 articles with PDF und HTML galleys, but nearly done. As data base we only had a Citavi file, which contains all metadata information, and our old website. We managed to gather all needed information with a complex Python script, which throws out OJS XML. With this process we managed nearly 90% of the articles to come in automated. The last 10% have to be done by hand.
We launched an article about the process (https://tatup.de/index.php/tatup/article/view/234), but it’s not technical (and in German – sorry!).

As ctgraham quoted this is done with the Native XML Plugin, which allows you to import whole issues including all submission files (of all types). To keep XML file sizes low I’d upload all submission files to a web server and have only references to them inside the XML. The plugin will load the references file automatically into OJS while importing.

Don’t underestimate the workload to transform the SciELO XML dialect to the OJS XML dialect. This can be very time consuming. Try to involve (and pay) an expert, which has good skills in XML transformations (be it with XSLT, Python, whatsoever …), ideally combined with a basic understanding of Journal mechanisms.

ctgraham pointed you to the right place in his first post for details of the OJS XML dialect. Additionally you can experiment with exporting already existing issues/articles directly out of your existing OJS. This is quite helpful.
You’ll also need a corresponding testing environment on a development server, where you can test furiously, unless everything works fine, before importing to your live system of OJS.

If the XML conversion is worked out well, importing large quantities of articles/issues is running very smoothly.

Please consider to provide your transformation scheme as Open Source on GitHub. This will expand the Open Source community and help OJS to get more flexible.

Regard, Tobias

ajnyga · November 6, 2019, 12:04pm

This is what our journals use for importing: GitHub - ajnyga/tsvConverter: Excel to OJS3 XML conversion tool

Of course with this solution you still need to convert the XML into a excel sheet described in the readme there.

edit: and I definitely agree with @twa above that always do a test import before importing to a production server.