JATS XML to PDF converter

Vitaliy · April 24, 2017, 8:15pm

Greeting to all,

I am pleased to announce the first release of JATS to PDF converter: Releases · Vitaliy-1/JATS2LaTeX · GitHub
The idea behind converter is that if you managed to create JATS XML you need no additional effort to make pretty look PDF.

As input converter takes article in XML format and gives 2 files as output - latex (for main text) and bibtex (for references). After this they can be easily compiled with a distribution of the TeX/LaTeX typesetting system.
The examples can be find here: JATS2LaTeX/example at jats2latex-0.5 · Vitaliy-1/JATS2LaTeX · GitHub
There lie 2 JATS XML files, example output tex and bib and one-click compiled pdf’s (without modifications)

For working with program you will need (all are free):

Java 8
distribution of the TeX/LaTeX typesetting system. For Windows I prefer MikTeX.
any LaTeX editor. I prefer TeXstudio (to finalize the look of the article)
download converter latest release (executional jar file)

To start the convertion from Windows simply write in windows cmd (win + r → type there cmd):
java -jar path/to/latex.jar path/to/article.xml path/to/article.tex path/to/article.bib
where path/to means absolute and relative path to latex.jar, input article in JATS XML format, output tex and bib files.

For creating pdf from the files just open LaTeX editor and compile with XeLaTeX (it has good support of UTF-8). In MikTeX you will need to compile several times - first time with XeLaTeX, than compile bibliography, and than compile with XeLaTeX once more. So 3 times press a button Actually this is a standard procedure for latex compiling.

The program can be modified to produce PDF from JATS automatically. But because we use it for production purpose we prefer to make the article’s look perfect. And this may need changing the output tex.

Features:

UTF-8 support
Parsing of sections, subsections and subsubsections. Italic and bold text is supported. Intext links to bibliography, tables and figure are also supported. LaTeX special symbols are replaced. If I forgot about a symbol, that need to be replaced - please tell me. It can be fixed in no-time. Or if one is familiar with Java, it can be fixed here.
Parsing of figures. Moreover it tries to download the figure if it has valid url link to the file inside JATS.
Parsing of Tables. Because tables in XML and LaTeX are very different, the problems may occur in complex tables. For example, if in a one row article has multicolumn followed with multirow parser may place “&” symbol in not appropriate place in the next row. I am planing to fix this in a future release. Nevertheless it takes several seconds to fix if ones knows latex syntax.
Parsing of reference list. As I noted earlier, converter produces reference list in bib format. As I know, bib file is most extensively used format for bibliography exchange, moreover, with bib reference list in LaTeX can be styled in any wide used reference styles (Vancouver, APA etc.). The program supports references to journals, books, chapters and conferences. We prefer to code the in JATS element-citation element, like: <element-citation publication-type="journal">. You can find our JATS in examples folder (articles are copyrighted by authors). If citation type is not explicitly pointed, the parser will look at the used tags inside element-citation element.

Don’t hesitate to ask questions. If you found a bug it is better to open an issue on github (on the project page). Detail description of the converter will be soon available there.

anupent · July 27, 2017, 1:09pm

Please see updates below.

Dear @Vitaliy,
Because of your docs2jats and jatsParser, we are now gradually adopting creation of JATS-XML as the main workflow in production phase.
So far, we have made a satisfactory progress.

Now, we are thinking of using this converter to automatically produce pdf file from JATS-XML. But, during my first command, I get following errors:

jats2pdf>java -jar latex.jar santosh.xml santosh.tex santosh.bib
Exception in thread “main” java.io.FileNotFoundException: D:\Odrive\JLMC\jats2pdf\JATS-journalpublishing1.dtd (The system cannot find the file specified)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(Unknown Source)
at java.io.FileInputStream.(Unknown Source)
at java.io.FileInputStream.(Unknown Source)
at sun.net.www.protocol.file.FileURLConnection.connect(Unknown Source)
at sun.net.www.protocol.file.FileURLConnection.getInputStream(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.setupCurrentEntity(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startEntity(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startDTDEntity(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDTDScannerImpl.setInputSource(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.dispatch(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.next(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unknown Source)
at javax.xml.parsers.DocumentBuilder.parse(Unknown Source)
at org.emed.main.Main.writerToFile(Main.java:70)
at org.emed.main.Main.main(Main.java:52)
jats2pdf>

I have java installed and path set correctly (I hope).
Can you look into what wrong am I doing?

Regards,
@anupent

Updates:
There was a line of code pointing to “JATS-journalpublishing1.dtd”. I removed that and then the above errors disappeared. Then I got errors as below:

Exception in thread “main” java.lang.NullPointerException
at org.emed.main.Meta.meta(Meta.java:100)
at org.emed.main.Main.jatsParser(Main.java:114)
at org.emed.main.Main.writerToFile(Main.java:74)
at org.emed.main.Main.main(Main.java:52)

Any clues?
Warm Regards,
@anupent

Vitaliy · July 27, 2017, 4:08pm

Is there in your JATS abstract node with the path /article/front/article-meta/abstract ?
What attributes it has?

anupent · July 27, 2017, 4:23pm

dear @Vitaliy,
Yes, there is article/front/article-meta/abstact. And, my abstract is as follows

<abstract>
		<p><bold>Introduction:</bold> Sialolithiasis is the most common disease of the salivary glands. Majority of sialoliths occur in the submandibular gland and is a common cause of acute and chronic infections of the gland. The size varies from one mm to one cm. Size greater than 15 mm are considered unusual or giant sialolith. </p>
		<p><bold>Case report:</bold> We present a case of an unusual size sialolith of 16 mm in submandibular gland duct which was removed via transoral incision. The aim of presenting this case report is to understand etio-pathogenesis, clinical presentation and management of submandibular sialolithiasis.</p>
		<p><bold>Conclusion:</bold>  Submandicular sialolithiasis of more than 15 mm in size though rare are not uncommon. They can be managed intraorally if situated at or near the orifice.</p>
		
	</abstract>

anupent · July 27, 2017, 4:38pm

I made my abstract as below but still same errors except at first line there is (Meta.java:98):

Blockquote
<abstract abstract-type="section"> <title>Abstract</title> <sec> <bold>Introduction:</bold> Sialolithiasis is the most common disease of the salivary glands. Majority of sialoliths occur in the submandibular gland and is a common cause of acute and chronic infections of the gland. The size varies from one mm to one cm. Size greater than 15 mm are considered unusual or giant sialolith. <bold>Case report:</bold> We present a case of an unusual size sialolith of 16 mm in submandibular gland duct which was removed via transoral incision. The aim of presenting this case report is to understand etio-pathogenesis, clinical presentation and management of submandibular sialolithiasis. <bold>Conclusion:</bold> Submandicular sialolithiasis of more than 15 mm in size though rare are not uncommon. They can be managed intraorally if situated at or near the orifice. </sec> </abstract>

Vitaliy · July 27, 2017, 4:47pm

Can you format abstract with type section like here in example: JATS2LaTeX/article_english.xml at master · Vitaliy-1/JATS2LaTeX · GitHub ?

anupent · July 27, 2017, 4:56pm

I did, and I have other errors.
If I get a hint where are the errors originating from, I would have looked into those section.
Now, my errors are as below:

at org.emed.latex.standard.MetaStandard.meta(MetaStandard.java:33)
at org.emed.main.Main.latexStandardWriter(Main.java:95)
at org.emed.main.Main.writerToFile(Main.java:87)
at org.emed.main.Main.main(Main.java:52)`

Regards,
Anupent

Vitaliy · July 27, 2017, 5:11pm

It means, that you need to specify journal name in JATS exactly like this: JATS2LaTeX/article_english.xml at jats2latex-0.5.1 · Vitaliy-1/JATS2LaTeX · GitHub

anupent · July 27, 2017, 5:16pm

Thank you @Vitaliy,
Now I am able to make my two files: .bib and .tex

Will update my progress from here later.

Warm regards,
@anupent

drafuse · June 14, 2021, 9:46pm

Hello @Vitaliy

I am using a XML jats4r file, which after making the above corrections to the header, journal title, and abstract, I am able to create the .tex and .bib files… but after running the command in the windows command shell I loose the bib reference numbers in the .tex files.

For example,

In the xml the references are shown as:

(<xref ref-type="bibr" rid="redalyc_179565134003_ref52">Padua, 1994</xref>)

But when it is converted to .tex, I loose the "redalyc_179565134003_ref52" , so my bibliography does not build correclty.

(\cite{bibPadua, 1994})

Any help would be great.

Thanks.

Vitaliy · June 15, 2021, 4:03pm

Hi @drafuse,

The problem is that I’ve moved away from LaTeX-based PDF generation in 2017-2018. I don’t remember the underline architecture of JATS2LaTeX well. Do you need specifically tex file or PDF as a final aim?

drafuse · June 15, 2021, 4:51pm

Hi @Vitaliy,

Thanks for the reply! My final aim is the PDF.

Any suggestions?

Vitaliy · June 16, 2021, 12:37pm

In this case, I’d suggest trying JATS Parser Plugin for OJS or the library on which it’s built.

Vitaliy · June 16, 2021, 12:41pm

I use this plugin for PDF generation in our university’s journal, e.g.: Перегляд Емоційне вигорання медичних працівників: моделі, фактори ризику та протективні фактори. It’s not 100% bulletproof and doesn’t support all article elements yet but that’s slowly improving. The only drawback is that the underlying technology (TCPDF) doesn’t allow much PDF customization.

drafuse · June 16, 2021, 4:20pm

Thank you @Vitaliy,

I think this is exactly what I was looking for, a way to generate a PDF article with metadata within the OJS where we have our journal.

Currently we are using Marcalyc to generate the jats-xml files, but we did not know how to convert these into PDFs for our journal publication.

Here is our current work flow (which we want to improve):

OJS for receiving manuscript docx and peer review .
If accepted for publication, we use InDesign to create a custom PDF. We want to remove this step from the workflow because InDesign is not open access, and the PDFs have no metadata.
Published articles are stored on our local website and external databases like Scielo, which requires xml. That is why we use Marcalyc.
Marcalyc exports to xml-jats4r, xml-scielo, html, epub and pdf. The problem is that Marcalyc PDF is not customizable, and right now there are issues we can not control (specifically with the tables).

Ideally, we would be able to do everything in OJS. So, avoid using InDesign, generate the xml-jats in Marcalyc, import that file back into OJS, and use the JATS Parser Plugin to generate the PDF with metadata. Does that sound right to you?

Any advice would be great!

Vitaliy · June 17, 2021, 11:33am

JATS Parser Plugin also doesn’t allow to control PDF creation. The only thing that is possible is to add own CSS but the list of supported styles is quite minimal: https://stackoverflow.com/a/47304730/6711224. There is also a longstanding feature request to support footnotes and formulas. All other major elements are supported but if you are relying on those the current output may not suit you.

It’s possible to convert DOCX to JATS XML with DOCX Converter Plugin and to edit it with Texture plugin

drafuse · June 17, 2021, 3:39pm

Thank you very much @Vitaliy.

Just tried Pandoc. Did not work out for me