JATS XML to PDF converter


#1

Greeting to all,

I am pleased to announce the first release of JATS to PDF converter: https://github.com/Vitaliy-1/JATS2LaTeX/releases
The idea behind converter is that if you managed to create JATS XML you need no additional effort to make pretty look PDF.

As input converter takes article in XML format and gives 2 files as output - latex (for main text) and bibtex (for references). After this they can be easily compiled with a distribution of the TeX/LaTeX typesetting system.
The examples can be find here: https://github.com/Vitaliy-1/JATS2LaTeX/tree/jats2latex-0.5/example
There lie 2 JATS XML files, example output tex and bib and one-click compiled pdf’s (without modifications)

For working with program you will need (all are free):

  • Java 8

  • distribution of the TeX/LaTeX typesetting system. For Windows I prefer MikTeX.

  • any LaTeX editor. I prefer TeXstudio (to finalize the look of the article)

  • download converter latest release (executional jar file)

To start the convertion from Windows simply write in windows cmd (win + r -> type there cmd):
java -jar path/to/latex.jar path/to/article.xml path/to/article.tex path/to/article.bib
where path/to means absolute and relative path to latex.jar, input article in JATS XML format, output tex and bib files.

For creating pdf from the files just open LaTeX editor and compile with XeLaTeX (it has good support of UTF-8). In MikTeX you will need to compile several times - first time with XeLaTeX, than compile bibliography, and than compile with XeLaTeX once more. So 3 times press a button :slight_smile: Actually this is a standard procedure for latex compiling.

The program can be modified to produce PDF from JATS automatically. But because we use it for production purpose we prefer to make the article’s look perfect. And this may need changing the output tex.

Features:

  • UTF-8 support

  • Parsing of sections, subsections and subsubsections. Italic and bold text is supported. Intext links to bibliography, tables and figure are also supported. LaTeX special symbols are replaced. If I forgot about a symbol, that need to be replaced - please tell me. It can be fixed in no-time. Or if one is familiar with Java, it can be fixed here.

  • Parsing of figures. Moreover it tries to download the figure if it has valid url link to the file inside JATS.

  • Parsing of Tables. Because tables in XML and LaTeX are very different, the problems may occur in complex tables. For example, if in a one row article has multicolumn followed with multirow parser may place “&” symbol in not appropriate place in the next row. I am planing to fix this in a future release. Nevertheless it takes several seconds to fix if ones knows latex syntax.

  • Parsing of reference list. As I noted earlier, converter produces reference list in bib format. As I know, bib file is most extensively used format for bibliography exchange, moreover, with bib reference list in LaTeX can be styled in any wide used reference styles (Vancouver, APA etc.). The program supports references to journals, books, chapters and conferences. We prefer to code the in JATS element-citation element, like: <element-citation publication-type="journal">. You can find our JATS in examples folder (articles are copyrighted by authors). If citation type is not explicitly pointed, the parser will look at the used tags inside element-citation element.

Don’t hesitate to ask questions. If you found a bug it is better to open an issue on github (on the project page). Detail description of the converter will be soon available there.


Is there an PDF-generating plugin for OJS 3.1?
#2

Please see updates below.

Dear @Vitaliy,
Because of your docs2jats and jatsParser, we are now gradually adopting creation of JATS-XML as the main workflow in production phase.
So far, we have made a satisfactory progress.

Now, we are thinking of using this converter to automatically produce pdf file from JATS-XML. But, during my first command, I get following errors:

jats2pdf>java -jar latex.jar santosh.xml santosh.tex santosh.bib
Exception in thread “main” java.io.FileNotFoundException: D:\Odrive\JLMC\jats2pdf\JATS-journalpublishing1.dtd (The system cannot find the file specified)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(Unknown Source)
at java.io.FileInputStream.(Unknown Source)
at java.io.FileInputStream.(Unknown Source)
at sun.net.www.protocol.file.FileURLConnection.connect(Unknown Source)
at sun.net.www.protocol.file.FileURLConnection.getInputStream(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.setupCurrentEntity(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startEntity(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startDTDEntity(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDTDScannerImpl.setInputSource(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.dispatch(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.next(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unknown Source)
at javax.xml.parsers.DocumentBuilder.parse(Unknown Source)
at org.emed.main.Main.writerToFile(Main.java:70)
at org.emed.main.Main.main(Main.java:52)
jats2pdf>

I have java installed and path set correctly (I hope).
Can you look into what wrong am I doing?

Regards,
@anupent

Updates:
There was a line of code pointing to “JATS-journalpublishing1.dtd”. I removed that and then the above errors disappeared. Then I got errors as below:

Exception in thread “main” java.lang.NullPointerException
at org.emed.main.Meta.meta(Meta.java:100)
at org.emed.main.Main.jatsParser(Main.java:114)
at org.emed.main.Main.writerToFile(Main.java:74)
at org.emed.main.Main.main(Main.java:52)

Any clues?
Warm Regards,
@anupent


#3

Is there in your JATS abstract node with the path /article/front/article-meta/abstract ?
What attributes it has?


#4

dear @Vitaliy,
Yes, there is article/front/article-meta/abstact. And, my abstract is as follows

<abstract>
		<p><bold>Introduction:</bold> Sialolithiasis is the most common disease of the salivary glands. Majority of sialoliths occur in the submandibular gland and is a common cause of acute and chronic infections of the gland. The size varies from one mm to one cm. Size greater than 15 mm are considered unusual or giant sialolith. </p>
		<p><bold>Case report:</bold> We present a case of an unusual size sialolith of 16 mm in submandibular gland duct which was removed via transoral incision. The aim of presenting this case report is to understand etio-pathogenesis, clinical presentation and management of submandibular sialolithiasis.</p>
		<p><bold>Conclusion:</bold>  Submandicular sialolithiasis of more than 15 mm in size though rare are not uncommon. They can be managed intraorally if situated at or near the orifice.</p>
		
	</abstract>

#5

I made my abstract as below but still same errors except at first line there is (Meta.java:98):

Blockquote
<abstract abstract-type="section"> <title>Abstract</title> <sec> <p><bold>Introduction:</bold> Sialolithiasis is the most common disease of the salivary glands. Majority of sialoliths occur in the submandibular gland and is a common cause of acute and chronic infections of the gland. The size varies from one mm to one cm. Size greater than 15 mm are considered unusual or giant sialolith. </p> <p><bold>Case report:</bold> We present a case of an unusual size sialolith of 16 mm in submandibular gland duct which was removed via transoral incision. The aim of presenting this case report is to understand etio-pathogenesis, clinical presentation and management of submandibular sialolithiasis.</p> <p><bold>Conclusion:</bold> Submandicular sialolithiasis of more than 15 mm in size though rare are not uncommon. They can be managed intraorally if situated at or near the orifice.</p> </sec> </abstract>


#6

Can you format abstract with type section like here in example: https://github.com/Vitaliy-1/JATS2LaTeX/blob/master/example/article_english.xml ?


#7

I did, and I have other errors.
If I get a hint where are the errors originating from, I would have looked into those section.
Now, my errors are as below:

at org.emed.latex.standard.MetaStandard.meta(MetaStandard.java:33)
at org.emed.main.Main.latexStandardWriter(Main.java:95)
at org.emed.main.Main.writerToFile(Main.java:87)
at org.emed.main.Main.main(Main.java:52)`

Regards,
Anupent


#8

It means, that you need to specify journal name in JATS exactly like this: https://github.com/Vitaliy-1/JATS2LaTeX/blob/jats2latex-0.5.1/example/article_english.xml#L9


#9

Thank you @Vitaliy,
Now I am able to make my two files: .bib and .tex

Will update my progress from here later.

Warm regards,
@anupent