A question about encoding 64

Vitor_R · December 17, 2019, 11:48am

Hello people,
I have a question, I want to parser via python 3.8.
The project is very simple, I want to transform the scielo.org xml into the open journal system’s xml.
But I don’t understand how the tag part works

embed mime_type enconding

<'galley locale=“en_US”>
<'label>PDF
<'file>
<‘embed mime_type=“application/pdf” encoding=“base64” filename=“example.pdf”>
<’/‘embed>
<’/‘file>
<’/'galley>

Because from what I understand the ‘giant’ part that stays during the ‘embed’ tag and what carries the pdf?
Could someone better explain this to me?
Because I really need to be able to do this way, it’s almost 7k articles.
Already thank you immensely that took the time to read, and also wanted to learn or have any questions.
I apologize in advance if I can’t detail it correctly.
Ps: Can I make the template for xml from versions 3.1.2 or 2.4.8.

Hugs.

ctgraham · December 17, 2019, 2:41pm

Base64 encoding is a way to represent binary data in ASCII characters.

In python, you will want to base64 encode the raw PDF content, and make that text output the value of the embed element. Note that this will make your XML file(s) very large.

Alternately, instead of embedding the content, the OJS XML allows you to link to a local or remote file. E.g.:

github.com

pkp/ojs/blob/54bec18568f8e842db79c6cd37e9be477e842ee7/plugins/importexport/native/sample.xml#L57


		</revision>
	</submission_file>
	<submission_file xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
		stage="proof" id="2" xsi:schemaLocation="http://pkp.sfu.ca native.xsd">
		<revision number="1" genre="Article Text"
		filename="article.html" viewable="true"
		date_uploaded="2014-03-06" date_modified="2014-03-06"
		filetype="text/html"
		uploader="admin">
			<name locale="en_US">name of the HTML file</name>
			<href src="http://URLTo/article.html"></href>
		</revision>
	</submission_file>
	<artwork_file xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
		stage="dependent" id="3" xsi:schemaLocation="http://pkp.sfu.ca native.xsd">
		<revision number="1" genre="Image"
			filename="images.png" viewable="false"
			date_uploaded="2014-03-06" date_modified="2014-03-06"
			filetype="image/png"
			uploader="admin">
			<name locale="en_US">image name</name>

In the href element, OJS will try to open the path provided to find the article content, rather than reading the content from a base64 string.

This facility is available in 2.x. and 3.x, though other parts of the schema are different for each.