Curent status for PDF to PMC XML conversion


At a certain moment in time PKP was involved in the Open Typesetting stack:
"Open Typesetting Stack
PKP has created a standalone service for converting Microsoft Word and PDF documents to
structured National Library of Medicine JATS XML (used by PLoS and PubMed Central), and from
there, creating attractive HTML, PDF, and ePub article views from the XML. This service is
intended to decrease the labour involved in typesetting, and to facilitate the creation of archivefriendly and web-native article formats. It is called Open Typesetting Stack, or OTS for short. The
OTS service is being integrated into OJS 3 as a plugin and is developed alongside Substance
Software’s Texture WYSIWYG XML editor.More details are here. "

I checked now the Substance Softare’s Texture website, and the PKP link, but both are dead.
Do you know what happened, or what is the current status for this development?


With Typesetting Stack, I cannot say anything.

However, Substance’s Texture is probably dead. There was no commit since August 2019 (GitHub - substance/texture: A visual editor for research.). Probably Substance moved towards developing a proprietary solution, but I do not know. Still, the latest release on the GitHub is usable.

Currently, there is a lot of buzz on converting Word to JATS-XML. One of my journals even applies a workflow to do so. But be sure, this is no “I push a button and everything works” solution. At the moment, this conversion is a long manual process, since the outcome of ANY automatic conversion is … mh… let’s say “not optimal”.

Also you need to have all your galleys in sync (PDF and JATS-XML have to have the exact same text content). This is another thing, you have to bear in mind.

There is a project running trying to solve these issues: Modern Publishing Perhaps it helps you!



@GrazingScientist Thank you so much for all the help and the details!


Hi @ojspkpuser,

About JATS XML Workflow in general:

Regarding PDF to JATS XML in particular, I’d recommend using Grobid: Home - GROBID Documentation. One of the problems is that it requires some expertise and converts PDF to TEI XML, thus, additional tool is required. I have a fork that is able to convert to JATS directly, although I didn’t update it for a while. It can be installed as a service; unfortunately, I don’t have running web instance right now to show an example.

1 Like

Hi @Vitaliy,

Thank you very much for the suggestions and all the help! The workflow was very useful, and I’ll give a try for Grobid. I was surprised to see a machine learning approach to this!

I also want to say that I appreciate a lot your implication in oldGregg theme and other OJS related projects! Thank you for all these contributions!