Workflow with galleys from JATS XML to HTML and PDF

Greetings to all,

We haven´t any experience working with XML (until now we just have PDF galleys, and we want to add JATS XML and HTML in OJS) and we wonder which tools we can use to establish a good workflow: only at the end of the process, for publishing the final galleys.
So, we need help since marking or creating the JATS XML, to the next steps: producing HTML and PDF files from XML. We would like to know which are the tools (software) you recommend us for the best workflow.
I´m talking about external tools to OJS (I know for instance the plugins for OJS 2.0 XML galley and the project in development Open Typesetting Stack), but we want to create or manage also the stylesheets to apply.

We are not developers (mainly graphic designers), then we can manage code in a very very low level, so we need to use “easy” software.
We will appreciate any suggestion or idea to introduce this workflow JATS XML → HTML and PDF in OJS.

Thanks very much!
Best,
Elpidio

4 Likes
  • JATS XML → HTML

You can use any already developed xslt stylesheet, for example this: JATSPreviewStylesheets/jats-html.xsl at master · ncbi/JATSPreviewStylesheets · GitHub

But there are a lot of them in the internet. The all can be run with saxon9he: Saxon XSLT and XQuery Processor - Browse /Saxon-HE at SourceForge.net

Just download archive, extract from there saxon9he.jar and run a command in cmd (if windows) or in terminal (if MAC or Ubuntu): http://www.saxonica.com/documentation9.5/using-xsl/commandline.html

With Lens Viewer JATS XML transforms to the HTML in client`s browser.

Also I am developing plugin, that will produce HTML from JATS on server side and display it on article detail page, but work will not be finished in the near future.

  • JATS XML → PDF

We use our own converter, written in Java, that transforms JATS to ready for compiling LaTeX. It is already done. In the near future I will make the public release. You can see the progress here: GitHub - Vitaliy-1/JATS2LaTeX: Java project that can be used for transformation of Journal Article Tag Suite XML to LaTeX and the example of pdf, generated by this way on our site: View of Evaluation of the evoked brain potentials of patients with asthenia and anxiety symptoms and the partial loss of sight

3 Likes

Thanks Vitaliy for your help,
I know your journal https://e-medjournal.com/, absolutely excellent using XML, HTML and PDF through Lens Viewer, really.
We would like also to use Lens Viewer and I have read all you did to adapt Word files and the rest to this process.
There´s only one (big) problem for me, Github, Terminal, Command line… all of this sounds almost “esoteric” for me, even if it´s not really complicated. So, I need to use “easy” software (WYSIWYG).

I´m trying things like Oxygen XML, but I don´t know if it´s the easiest way to write JATS XML, built transformations to HTML and PDF (through XSLT, XSL FO engine). Even this looks scary for me, but I would try it if it´s the best choice.
Maybe Altova Spy and StyleVision XML? Maybe other software?
I know that it´s not necessary, but as I said… I´m an “outdated graphic designer”.

Thanks a lot.

I have been working on a workflow with exactly the criteria in mind that you described. Namely something any editor could handle without using command line etc. However, my opinion is that at the moment it is not possible without some technical skills. At least you need to learn do some basic coding in JATS XML.

This is what I have:

  1. I use the OJS3-markup plugin (which uses the Open Typesetting Stack) to create a XML file from docx in the production phase. GitHub - kaschioudi/ojs3-markup: markup plugin for OJS3
  2. The editor manually edits the XML. This means checking for mistakes, especially with citations, and adding some extra metadata. For the metadata part I have a own plugin (jatsFrontPuller edit: GitHub - ajnyga/JatsFrontPuller) which basically creates the whole <front> section of the JATS XML file (that is the part with the article/issue/journal metadata).
  3. For publishing, I have modified the OJS core to support artwork files (like images) in the case of XML galleys (Allow artwork files for XML galleys by ajnyga · Pull Request #1353 · pkp/ojs · GitHub)
  4. For visualizing JATS XML content I have added some new features to the lensGalley plugin (GitHub - ajnyga/lensGalley: Galley viewer plugin integrating eLife Lens for OJS 3.0: fork using the latest develop branch of Lens) and a plugin that converts the JATS XML to html on the fly and shows it in the abstract page (GitHub - ajnyga/embedGalley: OJS3 plugin for visualizing JATS XML galleys).

The problematic part is of course the manual cleaning of the XML file you get from the PKP Open Typesetting Stack with the most problematic being the citations. In my opinion the best tool for cleaning the XML file is an honest text editor, but this could be just ignorance towards the actual XML editors.

Another problematic part is creating PDF from JATS XML. Here, I am working on a plugin that uses Cassius (GitHub - MartinPaulEve/CaSSius: CaSSius: a CSS-regions-based PDF typesetter for scholarly communications). I am thinking of combining the embedGalley plugin, which would output the article in HTML (with some nice CSS) and then implementing the Cassius library there, enabling the editor to print out a pdf version of the finished article. I hope to have a version of this ready by the end of this month. But Cassius, at the moment, can not handle cool two column layouts like @Vitaliy has there.

2 Likes

Thanks ajnyga,
I am looking for some software because I´m a dummy… it´s hard to believe that everyone who is publishing from JATS or NML XML has this kind of technical skills that we are talking about.

I´ve found information about some software used by publishers, like eXstyle and Typefi, but I´m afraid that these are quite expensive solutions and I couldn´t check them. This is the kind of software that really seems for dummies like me.

But I also found this kind of solutions:
FontoXML (that I haven´t check it)

Oxygen XML has implemented inside a JATS framework and looks “easy” (until a certain point) marking the JATS XML from a doc. The process for transformation to HTML and PDF looks also easy if you don´t want to change the styles of the framework.
But before starting learning with Oxygen (or Altova maybe) (that will take me quite a lot of time, it´s not the kind of software used by graphic designers) I wanted to ask for solutions used by other publishers.
We are a very little journal, but we would consider the investment in software to shorten the learning curve.
But maybe there isn´t a shortcut and I will have to learn about coding :slight_smile:

Thanks a lot for all the information about process and plugin you are using, it´s really helpful.

1 Like

Hi,

Not everyone has, but I know that a lot of journals use third party service providers to produce XML, in for example India.

Whatever service or program you use to convert the docx (or similar) to XML, the crucial thing is always the way your docx looks like. If things like headings, citations etc. are clearly marked, the automatic conversion with PKP OTS should be ok. Still, I find it hard to believe that with things like citations/references we would get to a point where no manual work on the XML file is needed. If the citations/references are saved in a database, the situation could be different, but who does that :smiley:

Anyway, the manual editing part after the automatic conversion is something you could do with an XML editor like Oxygen as well. For me, using a simple texteditor is easier because there are actually not that many tags used in the body and back part of JATS XML and a texteditor has a more hands on feeling. But this is also partly due to the fact that I have not much knowledge of tools like Oxygen.

For the workflow I have tried to put together Oxygen is nevertheless out of the question, because the tools should be free. Just in case you did not find this yet: http://jatswiki.org/wiki/Tools

1 Like

Hi,
yes, usually some publishers use third party service.
I know the wiki/tools and I was checking them because I understand very well what you mean about free tools (I share the same opinion).
And beside I really appreciate very much the great work that PKP is doing to provide very good tools for XML like OTS or Lens Viewer and perhaps in the next future the Texture editor (I´m praying for that :grinning:)
Everyone have to be grateful for that and also to the great job that all you are doing sharing your time and works to improve OJS.

So, thank sincerely for all your work and help.
Cheers.
…finally there is nothing left to roll up and get down to work with code :sweat:

1 Like

In our case we have used Java build-in regex tools for parsing in-text citations and reference section. As long as authors use AMA (Vancouver) citation style in their docx, after transformation JATS doesn’t require any additional correction for references. This is only a matter of good regex pattern.

1 Like

Oxygen uses TEIC Stylesheet for JATS to HTML transformation. Stylesheets are open source, Oxygen is not. This stylesheets also uses meTypeset, that is build in OTS. So Oxygen cannot provide the better result in this case, than PKP project. Moreover OTS uses additional usefull features, like 2 citation parsing modules.

Yeah, I do believe that a well formed docx would work also with PKP OTS and produce flawless references in JATS. I am just a bit sceptical when it comes to flawless docx files :smiley: In the traditional workflow (docx=>indesign=>pdf) it does not really matter if you have some small mistakes there, but in this case it probably does. So having a step for manual editing is probably needed also in the future. But if/when Texture is ready, that phase could be alot different from manually editing XML tags. (ps. I still think that a “source code” mode in Texture would be a good idea)

2 Likes

Thanks Vitaliy for your information about stylesheets in Oxigen, and the rest.
I´m realizing that maybe with just a good training in PKP tools I will be able to make the whole process.

Hi,

I have sketched the workflow I am working on here: GitHub - ajnyga/OJS3XMLWorkflow

I will probably update a more detailed descriptions during May and will hopefully get the PDF plugin working as well.

2 Likes

Hi ajnyga,
Excellent news about the PDF plugin. and thanks very much for the clear workflow you have publish.
I´m sure that it will be very helpful for a lot of editors (like me).
Cheers.

Hi @Vitaliy

I am interested on this plugin. Are you working on it on Github or some?
My PhD thesis theme is about mobile access and HMTL format article to enhance UX on reading articles on mobile.

Please, let me know when you release some version of this plugin you’re working.

Best

Israel

Hi @israel.cefrin,

For now I am finishing my work on JATS to LaTeX parser (Java): GitHub - Vitaliy-1/JATS2LaTeX: Java project that can be used for transformation of Journal Article Tag Suite XML to LaTeX
PHP parser will be the same, except it will produce HTML, not LaTeX: GitHub - Vitaliy-1/JATSParser: JATSParser is aimed to be integrated with Open Journal Systems 3.0+ for transforming JATS XML to various formats
Hope, ajnyga will help me with integration to the OJS.

Despite that I need only to rewrite the code in other programming language, I am coding only in a free time so plugin will not be ready soon.

2 Likes

Hi @israel.cefrin,

The work on the plugin is finished. You can view the example on our site (already integrated into article detail page). The concept is to record all data from JATS XML to PHP objects and then transfer the data from PHP object to Smarty templates. The plugin is on the github: GitHub - Vitaliy-1/JATSParserPlugin: OJS3 Plugin for parsing JATS XML and displaying it on article detail page

But for front-end displaying, like on our site, requires changing in article_detail.tpl and header.tpl. I will provide an example soon.

3 Likes

Tks for warning me @Vitaliy!!!
I will check it out. I think this gonna will help me a lot on my doctorate thesis experiment.

Best
Israel

Hi! @Vitaliy

Sorry for my bad English.

I am using the plugin (GitHub - Vitaliy-1/JATSParserPlugin: OJS3 Plugin for parsing JATS XML and displaying it on article detail page) in PHP 5.6.3 along with OJS 3.0.2 and the theme “desfaultmanuscript”.

To convert Docx to xml I used: DOCX to JATS XML converter.

The result can be seen here: https://ojstest.com/recaf/index.php/RCAF/article/view/42

The problems that appear are:

1.- Everything that corresponds to “entry_details” appears below instead of being displayed on the right side.

2.- References links do not respond [1], [2], etc.

I would like to ask @asmecher smecher if the previous versions of OJS (2.4.8.0-2.4.8.1-2.4.8.2) will be compatible with PHP 7 in the near future? See: Ensure PHP7 compatibility · Issue #1118 · pkp/pkp-lib · GitHub
I am asking this question because I am very interested in being able to display the articles in another format (besides PDF) and I am not sure if I will have problems changing the version of php 5.6.3 to 7

I have one more query:

@Vitaliy what version of lensgallery are you using and where can I get it from? I’ve downloaded lensGalley 2.0 (Release Lens 2.0.0 · elifesciences/lens · GitHub), I’ve also tried the @ajnyga Release ojs-3_0_2-0 · ajnyga/lensGalley · GitHub but it’s currently not working for me (it does not show jpg images) I hope to be able to find some kind of help on this problem soon.

Thank you so much for this tremendous work!

Hello! @Vitaliy

Sorry for my bad English.

I have been able to find some answers to my questions.

1.- About this problem: Everything corresponding to “entry_details” appears below instead of being shown on the right side ".

The solution was:

Edit overriding.css (pretty much eliminating everything).
You can see the result here: https://ojstest.com/recaf/index.php/RCAF/article/view/42

2.- When using the JatsParser plugin there was duplication of content (summary, title1, title2, etc.)

The solution was:

Hide everything duplicated through css:

H2 # title2.title, div.panwrap.abstract {
Display: none;
}

3.- I also added more css so that the format was similar to the theme.

But I still have these problems:

Could you help me solve these other problems?

1.- References Links do not respond [1], [2], etc.

2.- I would like to ask @asmecher smecher if the previous versions of OJS (2.4.8.0-2.4.8.1-2.4.8.2) will be compatible with PHP 7 in the near future? See: Ensure PHP7 compatibility · Issue #1118 · pkp/pkp-lib · GitHub
I am asking this question because I am very interested in being able to display the articles in another format (besides PDF) and I am not sure if you will have trouble changing the version of PHP 5.6.3 to 7

3.- @Vitaliy what version of lensgallery are you using and where can I get it? I downloaded lensGalley 2.0 (Release Lens 2.0.0 · elifesciences/lens · GitHub), I also tested the @ajnyga Releases · ajnyga/lensGalley · GitHub tag / DO S -3_0_2-0 but that currently does not work for me (does not show jpg images) I hope I can find some help on this problem soon.

Thank you so much for this tremendous work!

My changes were in the master branch. The release you tagged is the same as the PKP release, but I did a release from my master branch here: Release dev-branch · ajnyga/lensGalley · GitHub