Migrating data [Import & Export] [OJS 3.1.1.4 ]

OJS Version: 3.1.1.4
PHP Version: 5.6.38
Database: 10.1.35-MariaDB

I am currently in the process of manually generating an XML-file with data, to be used for importing issues and numbers into my current OJS installation.

I need some help with a working XML file for import to OJS, I tried exporting a manually created number and issue from a local OJS (version 3.1.0.1 with Php 7.x) but I then get a lot of errors when trying to generate the XML file, and I also get error when trying to import the XML-code generated.

When generating the XML, I get an invalid XML-file along with the error message:
Element ā€˜{http://pkp.sfu.ca}sectionsā€™: Missing child element(s). Expected is ( {http://pkp.sfu.ca}section ).

The project is about migrating a static HTML-based (various versions of HTML) magazine to OJS, along with images, attached PDF and external links. I have developed a C# based application to scrape data and generate a valid XML file.

I need some tips and pointers on how to migrate this type of magazine into OJS, and where to find a valid XML-file to use as a template when generating my XML.

Are there any other ways of importing numbers and issues to OJS? Scraping data to a database, or XML? Any other good ideas, tips and tricks you can share will be greatly appreciated.

Hi @Chrizze,

Producing XML to be imported is not so easy. Maybe you can find some useful info here: https://github.com/pkp/ojs/tree/master/plugins/importexport/native
There is an XML example.

Regards, Primož

Yes, Iā€™ve already used the plugin. Iā€™ve made a mock-up from the given examples that work for importing some example articles. But that is just about it. That native importer is not optimal, however.

I am currently trying to write a small console application in C# to scrape data from HTML-files and map them to XML-tags before generating an XML-file. Thereā€™s a lot of ā€œreverse engineeringā€ at this point. And the documentation is a bit outdated on this topic as well.

Hi @Chrizze

Here is maybe a more comprehensive example of an issue import7export XML: https://github.com/pkp/ojs/blob/master/tests/data/60-content/issue.xml.
However, when you are constructing your XML, you will surely need to take into account the OJS native DTD/schema: https://github.com/pkp/ojs/blob/master/plugins/importexport/native/native.xsd and https://github.com/pkp/pkp-lib/blob/master/plugins/importexport/native/pkp-native.xsd.

Best,
Bozana

1 Like

Yes, all help is greatly appreciated! :slight_smile:
I am using the xsd to generate classes within my application to get the correct XML-tags in the end. So far Iā€™ve managed to get a correct XML that works for importing to OJS 3.1.1.4, and Iā€™ve generated an XSD from it.

Visual Studio has an XSD tool to read XML schemas and generate classes from it. Now I ā€œonlyā€ have to scrape the HTML-files and populate the methods and instantiated objects with correct data before spitting out the XML-file.

I also have to index and attach pdf-papers to all articles.

Hi Chrizze,

Could you please help with correcting this sample XML to be able to import to OJS 3.1.1.4? This is a sample xml from my OJS installation. Iā€™m trying to find a working XML template so I can import Issues that have articles in pdf. This is all new to me so any suggestions will be greatly appreciated.

<?xml version="1.0" encoding="UTF-8"?> 1 Article Title The Subtitle the abstract John Smith john.smith@your-domain.com book reviews .. base64 encoded data is here ...

Here the error I get when after I import

"\n\t

Errors occured:

\n\t\t\t\t\t\t\t\t

  1. Submission

\n\t\t\t

\n\t\t\t\t\t\t\t\t\t* Unknown section 1
\n\t\t\t\t\t\t\t

\n\t\t\t"

Thanks

Hi lvaiaoga,
Yes, I get those weird outputs tooā€¦but my import works fine.

For each article you want to import you want to create a <issue_galleys>, in that element you want a <articles>, and then for each article, you want a <article> element. Check the template linked in this thread.

Fore each submission file, you will want to use the <supplementary_file> element. This element also need to be referenced in the <article galley>.

Example on a article galley file reference,

<article_galley xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" approved="false" xsi:schemaLocation="http://pkp.sfu.ca native.xsd">
 <id type="internal" advice="ignore">33</id>
 <name locale="en_US">PDF Report</name>
 <seq>7</seq>
 <submission_file_ref id="43" revision="1"/>
</article_galley>

The referenced file,

<supplementary_file xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" stage="proof" id="43" xsi:schemaLocation="http://pkp.sfu.ca native.xsd">
 <revision number="1" genre="Other" filename="Report.pdf" viewable="true" date_uploaded="2019-02-21" date_modified="2019-02-21" filetype="application/pdf" uploader="yourusername">
 <name locale="en_US">University, Report.pdf</name>
 <href src="http://www.yourwebsite.org/Report.pdf"></href>
 </revision>
 <publisher locale="en_US">Christer Johansson</publisher>
 <date_created>2019-02-21</date_created>
 <source locale="en_US">University</source>
</supplementary_file>

When importing it is important that the files are available on a webserver, OJS then fetches them from URL, rather than using Base64 embed. I could mock up a genereic template, but the template that is linked in this thread is working fine.

You can also manually create a journal, create a article manually and make sure it has everything you want it to have in production. Then you export this into a XML, and use this XML file as a template for your further imports, but you only change the base64 to a href reference instead.