Migrating data [Import & Export] [OJS 3.1.1.4 ]

Chrizze · October 29, 2018, 10:16am

OJS Version: 3.1.1.4
PHP Version: 5.6.38
Database: 10.1.35-MariaDB

I am currently in the process of manually generating an XML-file with data, to be used for importing issues and numbers into my current OJS installation.

I need some help with a working XML file for import to OJS, I tried exporting a manually created number and issue from a local OJS (version 3.1.0.1 with Php 7.x) but I then get a lot of errors when trying to generate the XML file, and I also get error when trying to import the XML-code generated.

When generating the XML, I get an invalid XML-file along with the error message:
Element ‘{http://pkp.sfu.ca}sections’: Missing child element(s). Expected is ( {http://pkp.sfu.ca}section ).

The project is about migrating a static HTML-based (various versions of HTML) magazine to OJS, along with images, attached PDF and external links. I have developed a C# based application to scrape data and generate a valid XML file.

I need some tips and pointers on how to migrate this type of magazine into OJS, and where to find a valid XML-file to use as a template when generating my XML.

Are there any other ways of importing numbers and issues to OJS? Scraping data to a database, or XML? Any other good ideas, tips and tricks you can share will be greatly appreciated.

primozs · October 29, 2018, 2:12pm

Hi @Chrizze,

Producing XML to be imported is not so easy. Maybe you can find some useful info here: https://github.com/pkp/ojs/tree/master/plugins/importexport/native
There is an XML example.

Regards, Primož

Chrizze · October 29, 2018, 2:38pm

Yes, I’ve already used the plugin. I’ve made a mock-up from the given examples that work for importing some example articles. But that is just about it. That native importer is not optimal, however.

I am currently trying to write a small console application in C# to scrape data from HTML-files and map them to XML-tags before generating an XML-file. There’s a lot of “reverse engineering” at this point. And the documentation is a bit outdated on this topic as well.

bozana · October 29, 2018, 4:21pm

Hi @Chrizze

Here is maybe a more comprehensive example of an issue import7export XML: https://github.com/pkp/ojs/blob/master/tests/data/60-content/issue.xml.
However, when you are constructing your XML, you will surely need to take into account the OJS native DTD/schema: https://github.com/pkp/ojs/blob/master/plugins/importexport/native/native.xsd and https://github.com/pkp/pkp-lib/blob/master/plugins/importexport/native/pkp-native.xsd.

Best,
Bozana

Chrizze · November 17, 2018, 12:44pm

Yes, all help is greatly appreciated!
I am using the xsd to generate classes within my application to get the correct XML-tags in the end. So far I’ve managed to get a correct XML that works for importing to OJS 3.1.1.4, and I’ve generated an XSD from it.

Visual Studio has an XSD tool to read XML schemas and generate classes from it. Now I “only” have to scrape the HTML-files and populate the methods and instantiated objects with correct data before spitting out the XML-file.

I also have to index and attach pdf-papers to all articles.

lvaiaoga · February 28, 2019, 7:06pm

Hi Chrizze,

Could you please help with correcting this sample XML to be able to import to OJS 3.1.1.4? This is a sample xml from my OJS installation. I’m trying to find a working XML template so I can import Issues that have articles in pdf. This is all new to me so any suggestions will be greatly appreciated.

<?xml version="1.0" encoding="UTF-8"?> 1 Article Title The Subtitle the abstract John Smith john.smith@your-domain.com book reviews .. base64 encoded data is here ...

Here the error I get when after I import

"\n\t

Errors occured:

\n\t\t\t\t\t\t\t\t

Submission

\n\t\t\t

\n\t\t\t\t\t\t\t\t\t* Unknown section 1
\n\t\t\t\t\t\t\t

\n\t\t\t"

Thanks

Chrizze · March 25, 2019, 11:53am

Hi lvaiaoga,
Yes, I get those weird outputs too…but my import works fine.

For each article you want to import you want to create a <issue_galleys>, in that element you want a <articles>, and then for each article, you want a <article> element. Check the template linked in this thread.

Fore each submission file, you will want to use the <supplementary_file> element. This element also need to be referenced in the <article galley>.

Example on a article galley file reference,

<article_galley xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" approved="false" xsi:schemaLocation="http://pkp.sfu.ca native.xsd">
 <id type="internal" advice="ignore">33</id>
 <name locale="en_US">PDF Report</name>
 <seq>7</seq>
 <submission_file_ref id="43" revision="1"/>
</article_galley>

The referenced file,

<supplementary_file xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" stage="proof" id="43" xsi:schemaLocation="http://pkp.sfu.ca native.xsd">
 <revision number="1" genre="Other" filename="Report.pdf" viewable="true" date_uploaded="2019-02-21" date_modified="2019-02-21" filetype="application/pdf" uploader="yourusername">
 <name locale="en_US">University, Report.pdf</name>
 <href src="http://www.yourwebsite.org/Report.pdf"></href>
 </revision>
 <publisher locale="en_US">Christer Johansson</publisher>
 <date_created>2019-02-21</date_created>
 <source locale="en_US">University</source>
</supplementary_file>

When importing it is important that the files are available on a webserver, OJS then fetches them from URL, rather than using Base64 embed. I could mock up a genereic template, but the template that is linked in this thread is working fine.

You can also manually create a journal, create a article manually and make sure it has everything you want it to have in production. Then you export this into a XML, and use this XML file as a template for your further imports, but you only change the base64 to a href reference instead.