Who is who in JATS? (2018)

jats

#1

Abstract

This is a report about the JATS state of the art that we started in 2017 and updated this year. The post assumes you know what is JATS (a technology that, between many other things, can help you to automatize the layout creation, based on XML files for articles) and shows a global picture to let you discover what tools do you have and how they are related if you like to implement this technology.

Introduction

During the last three years we have been talking a lot about JATS. PKP have been a main actor in this movie, even before this recent interest, and our forum show this increasing interest: http://forum.pkp.sfu.ca/search?q=jats

A few journals we work with asked to start “JATSing” (we need a new verb :-)) so, before start developing a new solution, we made some research trying to discover “who is who” in the JATS universe.

The result of this research is this summary with the “state of the art of JATS”. The graphic shows a picture of the whole puzzle and show all the involving pieces trying to discover if some parts are still uncovered.

Let us advance this was not an easy task because some tools are incomplete, abandoned or changed its name 2-3 times… so please, comment if you think we missed any relevant actors in this map or we described something wrongly.

We divided the JATS workflow in 4 phases:

  1. SUBMISSION: the author send his/her work (normaly in DOCx or ODT) to the platform.
  2. CONVERSION: the originals are transformed to JATS (and XML from the NML).
  3. EDITING: conversions are not perfect and need to be fixed or final changes need to be included.
  4. PRESENTATION: JATS is converted in a standard format (HTML, PDF, EPUB…) or directly shown with JS helpers.

And here you have the inventory of tools that alone or in combination with others look promising to our initial goal (cover the whole workflow):

1: Submissions

[A1] OJS 3 core: by default, OJS is able to work with originals in any format. DOC, DOCX, ODT, PDF even JATS are accepted by the platform. Will also accepted but not found much info about the submission of markdown originals.

  • Authors: PKP

[A2] fidusWriter: there is a plugin to integrate fidus in OJS. A development over fidusWriter to allow authors submit papers directly in JATS seams feasible.

2: Conversion

[B1] myTypeset: command-line tool wrote in python to convert from DOCX format to NLM/JATS-XML". Is a fork of the OxGarage stack and is implemented as part of the PKP XML parsing stack.

[B2] Pandoc: commandline tool wrote in Haskell, to convert from a lot of formats to JATS

[B3] DOCX2JATS: Java project, aimed to facilitate DOCX to JATS XML transformation for scientific articles.

[B4] JatsFrontPuller: OJS3 plugin for generating the JATS XML Front section based on OJS metadata. Author says “is not need any more”, because is included in the ojs3-markup plugin.

[B5] PeerJ JATS conversion: Is not a tool ready to use “out of the box”, but you can find there some helper files from PeerJ to convert, fetch and validate JATS.

3: Edition

[C1] Substance/Texture (aka. LensWriter): It’s confusing and difficult to understand why they changed the name so many times. In short, Substance is an “abstract” web editor, that is “instanced” to be JATS compliant (so long ago, was also called LensWriter). Is still a little buggy but it’s the most promising free software JATS editor avaliable.

[C2] Authorea (privative): … (toDo)

[C3] Overleaf (privative): … (toDo)

[C4] FontoXML (privative): A propietary xml editor that includes a subset to edit JATS. Suggested as a choice in forums, but there is no OJS integration.

4: Presentation

[D1] embedGalley: OJS3 plugin to automatically converts a JATS XML galley to embedded HTML, which is shown on the abstract page. The plugin is using some code and XSL files from [B7].

[D2] OAI JATS: OJS3 plugin to expose JATS XML via the OAI-PMH interface.

[D3] Lens Galley: Galley viewer plugin integrating eLife Lens for OJS 3.0

[D4] JATSParser: OJS3 plugin to parse JATS XML and displaying it on article detail page. For correct displaying should be used in conjunction with modified default Manuscript theme or user’s theme should be changed accordingly.

*: JAT suites

Only 3 tools cover the whole JATS workflow. Those tools are:

[Z1] OTS + ojs3-markup + Lens Galley plugin: PKP offers a combination of tools to cover conversion, edition and presentation. With this modular “unix-like” approach you can decide what component do you like to use.

Conclusion

Still seeking for time to test it all, but after some preliminary testing and some reading, my personal conclusion, is that the only full free software solution to cover all the publication workflow is the PKP OTS-markup-lens suite.

It’s complete, is free software, comes with PKP seal and looks mature enough to be used in production environments.

Did you test any of them? Comments?


JATS Parser v2.0: JATS XML to HTML and PDF conversion
Ficheros XML en el OJS 3.1.1-2
#2

Nice list. I agree that lens is working fairly well, but is does have some major problems. For example it is not mobile friendly, which pretty much ruins the whole idea of a HTML layout article. @Vitaliy’s converter is probably the best you can find. I will continue maintaining embedGalley for a while probably. What I would like to see is support for article download statistics. I mean a plugin that would:

  1. Show the beginning of a HTML article under the abstract with a “read more” button
  2. hitting the button would add a hit to the article download statistics
  3. Google and other similar engines could still index the whole article content without hitting the button or the button would be accessible for them

Because at the moment journals using plugin like embedGalley do not get COUNTER compliant statistics - everything is seen as only abstract hits.

Also as you noticed, the jatsFrontPuller plugin is definitely not needed anymore and will not probably work with the latest OJS. I made it because OTS did not use much of the OJS metadata in the front section of the JATS document.

In my opinion that part of the document should always be automatically synced from OJS database and should not be open for any editing in the XML document itself. Thinking of Texture here. Or alternatively the editing the metadata in the document using Texture should upgrade the OJS metadata as well. The bottom line is that situations where the JATS metadata and OJS metadata have differences should be avoided or journals will run into trouble later.


#3

JATS Parser plugin v.2 is almost ready. It is compliant with every theme that uses Bootstrap 4. Apart from JATS to HTML transformation, it also generates PDF.


#4

This is a great resource, Marc! Many thanks. It’s worth noting that PKP is in the process of evaluating Grobid as a conversion utility. It shouldn’t be listed here just yet as it’s really currently an explicity doc->TEI conversion utility, but it’s already part of the OTS workflow and we’ll be evaluating it as a standalone conversion option as well.

Cheers,
James


#5

Thank you Marc for this very useful compilation!

We use (since 2012) a commercial product to convert Word to JATS (actually the tool does more than that, e.g., checking references against databases, in-text citations against reference list, parsing references etc) as we did not find a satisfactory, open source solution for that task. Perhaps this has changed since or will change in near future? One problem for our workflow was to sync OJS metadata with our JATS metadata. In our workflow we consider the JATS metadata as the final, valid metadata and we update the OJS metadata from the JATS document. For OJS 2.x PKP developed a small plugin for us that reads a JATS document and (after review) overwrites the OJS metadata. So I disagree slightly with @ajnyga that the JATS document should always be synced from the OJS database - in our workflow its the other way round - but I strongly agree that synchronization between metadata in JATS and OJS is very important to avoid inconsistencies.
Another development I would like to see/use is a working proofreading solution, i.e., an JATS baesed editor like Substance/Texture that allows corrections by authors directly on the JATS source (including track changes), thus avoiding creating of PDF proofs, commenting by authors and again transfering this corrections manually into JATS. My impression is however, that current solutions are not ready to be used in production (with demanding text styles like complex tables, equations, statistics/Math ML etc.). Are there any experiences with using such JATS/XML authoring tools for proofreading?

Thanks again for this post!

Best wishes

Armin


#6

Just a note that I was mainly thinking of the Texture editor integration that you can use in OJS to edit JATS XML files. In some version of the Texture integration you could edit the front part (metadata) of the JATS document as well and the changes you made to the file were not synced to OJS metadata so you ended up with two sets of metadata. Could be that this has changed.

Your case is a bit different while, if I understood correctly, you create JATS XML outside OJS and upload it as a galley file.

So I can see three cases that need synchronization

  1. When you upload a JATS XML galley file, OJS should ask if you want to update the OJS metadata
  2. When you edit the OJS metadata, OJS should ask if you want to update the JATS XML metadata (if JATS galleys exist)
  3. When you edit JATS XML with Texture, you should not be allowed to edit the metadata of the file (or if you are allowed, changes there should be synced to the OJS metadata)

#7

Sorry for the silence. Summer vacations are sacred. :wink:

Thanks a lot for your comments. So to summarize:

  • In a close future OTS will take adventage of Grobid.
  • Lens is not mobile friendly.
  • If you use embedGalley, you will miss hits in your statistics because is not COUNTER compliant (yet?)
  • JATS Parser is now v.2 and:
    • … is compliant with every Bootstrap 4 theme
    • … generates HTML and PDF from JATS (using transformations).

I love to know what would be your best set of tools today.

In the article I suggested using the full PKP suite, but if I catch you (@ajnyga), you are suggesting replace lens with JATS Parser, isn’t it?


#8
  • embedGalley will miss statistic hits because it shows the article on the abstract page. So there are no downloads of the article present because no one is actually downloading the XML galley file. This applies to any plugin that embeds the article full text to the abstract page. There are several ways to solve this, but we only have one journal using JATS so it has been a low priority.

  • The journal we have using JATS actually stopped using LENS because they had problems with the way it handled references. There were some missing for example that existed in JATS file. So yes, JATS parser plugin is definitely better imho. But as I said, this is overall low priority at least for now.