Problem with native XML article import

Greetings all,
i’m trying to migrate content from an old online journal into a new OJS 3.1.1 instance on my localhost. The errors that system throws at me are following:

Validation errors:

Input is not proper UTF-8, indicate encoding ! Bytes: 0xF8 0xED 0x76 0xE1
The document has no document element.

I’m using the sample XML file provided in the importexport/native folder and I’m populating the required elements and attributes with python script. The weird thing is that, when I try to manually fill out the sample.xml template and upload it everything seems just fine, but if I try to import my python generated xml file(which looks exactly the same), the system gives me the above mentioned errors. Any advice will be much appreciated. Thanks in advance.

PS: I am pulling all the data I can get from the old website of the journal with regards to articles as well as issues. My question is, is there a way to import these articles in the respected issues at the same time ? So the end goal would be to have OJS instance populated with articles in respected issues.

Hi @Matus_Muransky,

I think you’ve got an issue with your character encoding; check that e.g. accented characters are encoded UTF-8 rather than e.g. Latin1. A good programmer’s editor can help with this, as can tools like xmllint.

Regards,
Alec Smecher
Public Knowledge Project Team

Hi @asmecher,
first of all, thank you very much. I forgot to encode the whole thing to utf8 before generating it. Now, as i mentiond in th P.S. of the original post, I am trying to import articles and issues as well. Right now if I try to import XML file containing the article with the associated issue information, the system doesn’t respond with error or success. The result page just stays blank. I’m posting the xml file below.

  <?xml version="1.0" encoding="UTF-8"?>
<article xmlns="http://pkp.sfu.ca" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	locale="en_US" section_ref="ART" date_submitted="2018-05-07" date_published="2018-02-28" xsi:schemaLocation="http://pkp.sfu.ca native.xsd"
	stage="submission">
	<id type="internal">5</id>
	<title locale="en_US">High Efficiency Classes of RF Amplifiers</title>
	<prefix locale="en_US"></prefix>
	<subtitle locale="en_US"></subtitle>
	<abstract locale="en_US">This article is dealing with high efficiency RF amplifiers in modern classes F, E and J. The first part is focused on basic function, main parameters and the output matching topologies of the mentioned classes. Output voltage and current waveforms were simulated for each class of high efficiency amplifiers. The primary focus of this work is the practical design of class F amplifier for 435 MHz band with E-pHEMT transistor. Power added efficiency (PAE) of amplifier achieved 58% and output power was 27 dBm with 14 dBm of input power. Amplifier was realized exclusively with lumped components in order to adhere to the given dimensions. Class F amplifiers designed at megahertz frequencies and with E-pHEMT transistor are quite rare and this article could help designers with understanding narrowband F-class amplifiers with higher efficiency. This amplifier can be used in long range IoT application, because of its low consumption of energy which is necessary in this modern technology. All results were simulated within ADS Keysight environment. Every simulation was realized with nonlinear models from Modelithics.</abstract>
	<authors xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
		xsi:schemaLocation="http://pkp.sfu.ca native.xsd">
		<author primary_contact="true" user_group_ref="Journal manager">
			<firstname>Erik</firstname>
			<lastname>Herceg,</lastname>
			<email>placeholder@email.com}</email>
		</author>
		<author primary_contact="true" user_group_ref="Journal manager">
			<firstname>Tomáš</firstname>
			<lastname>Urbanec</lastname>
			<email>placeholder@email.com}</email>
		</author>
	</authors>
	<submission_file xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
		stage="submission" id="1" xsi:schemaLocation="http://pkp.sfu.ca native.xsd">
		<revision number="1" genre="Article Text"
			filename="high-efficiency-classes-of-rf-amplifiers.pdf" viewable="true"
			date_uploaded="2018-02-28" date_modified="2018-02-28" filesize="1227218"
			filetype="application/pdf"
			uploader="a">
			<name locale="en_US"></name>
		</revision>
	</submission_file>
	<issue_identification>
	<volume>20</volume>
	<number>1</number>
	<year>2018</year>
	<title locale="en_US"></title>
</issue_identification>
  	<pages>6</pages>
</article>

Hi @Matus_Muransky,

Blank pages usually indicate a PHP error. Check your PHP error log for details.

Regards,
Alec Smecher
Public Knowledge Project Team

Hi @asmecher,
since I’m new to OJS and pretty new to php and web apps, could you please explain to me what I am doing wrong. Don’t want to bother much, but I could really use som hint here. I’m posting the error log down below.
Again, thank you very much in advance !

 [07-May-2018 17:30:36 UTC] PHP Warning:  implode(): Invalid arguments passed in C:\wamp64\www\ojs\plugins\importexport\native\filter\NativeXmlArticleFilter.inc.php on line 260

[07-May-2018 17:30:36 UTC] PHP Stack trace:

[07-May-2018 17:30:37 UTC] PHP   1. {main}() C:\wamp64\www\ojs\index.php:0

[07-May-2018 17:30:37 UTC] PHP   2. PKPApplication->execute() C:\wamp64\www\ojs\index.php:68

[07-May-2018 17:30:37 UTC] PHP   3. Dispatcher->dispatch() C:\wamp64\www\ojs\lib\pkp\classes\core\PKPApplication.inc.php:247

[07-May-2018 17:30:37 UTC] PHP   4. PKPPageRouter->route() C:\wamp64\www\ojs\lib\pkp\classes\core\Dispatcher.inc.php:134

[07-May-2018 17:30:37 UTC] PHP   5. PKPRouter->_authorizeInitializeAndCallRequest() C:\wamp64\www\ojs\lib\pkp\classes\core\PKPPageRouter.inc.php:233

[07-May-2018 17:30:37 UTC] PHP   6. call_user_func:{C:\wamp64\www\ojs\lib\pkp\classes\core\PKPRouter.inc.php:372}() C:\wamp64\www\ojs\lib\pkp\classes\core\PKPRouter.inc.php:372

[07-May-2018 17:30:37 UTC] PHP   7. PKPToolsHandler->importexport() C:\wamp64\www\ojs\lib\pkp\classes\core\PKPRouter.inc.php:372

[07-May-2018 17:30:37 UTC] PHP   8. NativeImportExportPlugin->display() C:\wamp64\www\ojs\lib\pkp\pages\management\PKPToolsHandler.inc.php:98

[07-May-2018 17:30:37 UTC] PHP   9. NativeImportExportPlugin->importSubmissions() C:\wamp64\www\ojs\plugins\importexport\native\NativeImportExportPlugin.inc.php:137

[07-May-2018 17:30:37 UTC] PHP  10. Filter->execute() C:\wamp64\www\ojs\plugins\importexport\native\NativeImportExportPlugin.inc.php:292

[07-May-2018 17:30:37 UTC] PHP  11. NativeXmlArticleFilter->process() C:\wamp64\www\ojs\lib\pkp\classes\filter\Filter.inc.php:449

[07-May-2018 17:30:37 UTC] PHP  12. NativeImportFilter->process() C:\wamp64\www\ojs\plugins\importexport\native\filter\NativeXmlArticleFilter.inc.php:80

[07-May-2018 17:30:37 UTC] PHP  13. NativeXmlArticleFilter->handleElement() C:\wamp64\www\ojs\lib\pkp\plugins\importexport\native\filter\NativeImportFilter.inc.php:60

[07-May-2018 17:30:37 UTC] PHP  14. NativeXmlSubmissionFilter->handleElement() C:\wamp64\www\ojs\plugins\importexport\native\filter\NativeXmlArticleFilter.inc.php:69

[07-May-2018 17:30:37 UTC] PHP  15. NativeXmlArticleFilter->populateObject() C:\wamp64\www\ojs\lib\pkp\plugins\importexport\native\filter\NativeXmlSubmissionFilter.inc.php:84

[07-May-2018 17:30:37 UTC] PHP  16. NativeXmlSubmissionFilter->populateObject() C:\wamp64\www\ojs\plugins\importexport\native\filter\NativeXmlArticleFilter.inc.php:121

[07-May-2018 17:30:37 UTC] PHP  17. NativeXmlArticleFilter->populatePublishedSubmission() C:\wamp64\www\ojs\lib\pkp\plugins\importexport\native\filter\NativeXmlSubmissionFilter.inc.php:116

[07-May-2018 17:30:38 UTC] PHP  18. NativeXmlArticleFilter->parseIssueIdentification() C:\wamp64\www\ojs\plugins\importexport\native\filter\NativeXmlArticleFilter.inc.php:214

[07-May-2018 17:30:38 UTC] PHP  19. implode() C:\wamp64\www\ojs\plugins\importexport\native\filter\NativeXmlArticleFilter.inc.php:260

[07-May-2018 17:30:38 UTC] ojs2: DB Error: Data truncated for column 'seq' at row 1

About this error: my theory - you try to add article in non-existed volume. According to sources from error log NativeXmlArticleFilter.inc.php:260 there should be volume in base with such credentials for adding article in it

Hi @Matus_Muransky,

The relevant error message is right at the bottom of the log snippet you included:

ojs2: DB Error: Data truncated for column 'seq' at row 1

I suspect there’s a stack trace immediately after that; can you post it?

Regards,
Alec Smecher
Public Knowledge Project Team

Hi @asmecher ,
well it stops right there. The DB error message was the last one in the log every time I tried to import the XML file.

Hi @Matus_Muransky,

Hmm, odd that you’re seeing stack traces for the warnings but not for the error. Is your show_stacktrace setting turned on in config.inc.php?

Regards,
Alec Smecher
Public Knowledge Project Team

Hi @asmecher,
so I checked the config.inc.php and and turned show_stacktrace on, but the log file still doesn’t show the stacktrace fo the DB error. It behaves the same way as before.

Hi @Matus_Muransky,

Hmm, another way of determining the problem query is to turn on the debug option in config.inc.php right before running the import. This will cause OJS to dump all SQL queries as they’re executed. The last one before the error will be the query that causes it.

Regards,
Alec Smecher
Public Knowledge Project Team

Dear all
I am stuck at the same point. Using the command line in OJS 3.1.2-4.
The XML was generated with @ajnyga 's tsvConverter with minor replacements (givenname, familyname) and validates against native.xsd

The issue_id is correct.

This is the part of debug output just before the error occurs:

PKP-Database-Logger 1581352906,6934: Query: INSERT INTO published_submissions
(submission_id, issue_id, date_published, seq, access_status)
VALUES
(869, 48, ‘2001-03-01 00:00:00’, ‘’, ‘’) failed. Data truncated for column ‘seq’ at row 1
PKP-Database-Logger 1581352906,6934: 1265: Data truncated for column ‘seq’ at row 1
ADOConnection._Execute(INSERT INTO published_submissions (submission_id, issue_id, date_published, seq, access_status) VALUES (869, 48, '20…)% line 1032, file: /home/klaus/ojs/lib/pkp/lib/adodb/adodb.inc.php
ADOConnection.Execute(INSERT INTO published_submissions (submission_id, issue_id, date_published, seq, access_status) VALUES (869, 48, '20…, Array[1])% line 228, file: /home/klaus/ojs/lib/pkp/classes/db/DAO.inc.php
DAO.update(INSERT INTO published_submissions (submission_id, issue_id, date_published, seq, access_status) VALUES (?, ?, '2001-…, Array[4])% line 559, file: /home/klaus/ojs/classes/article/PublishedArticleDAO.inc.php
PublishedArticleDAO.insertObject(Object:PublishedArticle)% line 118, file: /home/klaus/ojs/lib/pkp/plugins/importexport/native/filter/NativeXmlSubmissionFilter.inc.php
NativeXmlSubmissionFilter.populateObject(Object:Article, Object:DOMElement)% line 121, file: /home/klaus/ojs/plugins/importexport/native/filter/NativeXmlArticleFilter.inc.php

DB Error: Data truncated for column 'seq' at row 1

Stack Trace:

Interestingly the value of seq seems to empty. Any ideas?

Best regards
Klaus

I found the trick for me:
the entries <article xmlns:xsi=" ... need the attributes seq="<number>" for the order within the section and access_status="0"
BTW.: What does the access_status mean?

can you tag this discussion to the tsvConverter issues, I have not had time to upgrade the tool in a while…

1 Like

See pull request: https://github.com/ajnyga/tsvConverter/pull/14