I’m sorry, if this is the wrong forum to ask, but I read this article of parth sarin, which sounds promising! But I was not able to find any information about what’s technically behind all of this.
What technology do we have to use to produce PDFs/HTMLs/XMLs in the contexts mentioned in the article (“Minimal HTML” or “own JATS XML”)? What PDF rendering mechanism is suggested (XSL-FO or CSS3/4 with some PDF renderer …)?
Where can I get some more technical background?
I’d be glad to read a bit more in-debth information!
My understanding is that there a lot more improvements coming around this in 3.6. We have an upcoming webinar on 3.6 development that might be a good opportunity to learn about this/ask questions: https://pkp.sfu.ca/2025/10/03/webinar-coming-in-ojs-3-6/
As for the present state of things @Vitaliy might be able to speak to this? (or suggest someone who can).
And just a quick note to add that here are some of the more recent discussions of JATS amongst our developer community: GitHub · Where software is built
That’s an interesting view on the issues regarding JATS.
I’m aware of @Vitaliy’s DOCX converter, which is surely very useful for some journals. Back then I decided not to use it, because our journal was too complex for it.
Nonetheless we do have some other journals, which now use (respectively will use) OJS, that match much better to those tools.
In parth sarin’s article I read from a “HTML rich text editor” (RTE) and “Minimal HTML” approach. As far as I understood, the goal is to convert a DOCX after the whole review process into the RTE and finalize (resp. copyedit) in the RTE. From there all galleys (PDF, HTML, JATS) could be converted “with a simple click”.
That would be a very appreciated production step, which will make an external typesetter redundant and fit perfectly for one or two of our journals.
Therefore I’m very interested which tech stack will be used to do all this conversion work and how we could prepare and optimize our layout to step in this direction.
I don’t think, that we will get to the point, to feed JATS into OJS. In my experience it is very hard for editors and copyeditors (not to mention authors) to adapt to a XML centric workflow.
Thanks for your additional comments. I share some of your same understanding and reservations about JATS, I am also very interested to learn more , but don’t have enough context to be able to comment at this point in time. It is my understanding that a lot of work related to JATS is underway presently as part of the Open Research Europe project, and part of this includes assessing the tech stack that will utilized as part of this. I’m hopeful that @Vitaliy or another dev team member can provide more details.
We’re just finalizing some plans for this that we can communicate in more detail.
In short – we’re working with the team behind OS-APS to break that monolithic application into a more neutral-stack body text editor, which we’ll be integrating into OJS. (OS-APS is an Open Source conversion/editing/typesetting toolset deriving from SciFlow.)
The import conversion to ingest a word processor document for processing in the editor is a little bit less fully formed yet, but we’ll be piloting an approach in the near future.
For the export conversion (HTML and PDF), we’re looking at generating the HTML for presentation within OJS using a fairly simple approach, then using Vivliostyle to typeset that into PDF.
Regards,
Alec Smecher
Public Knowledge Project Team
Thanks @asmecher for this first tech stack impression of things to come!
I have been working for several years now with PrinceXML (commercial licence) to convert HTML to PDF and so am familiar with paged-media CSS, and I’ve tried to gain a rough overview yesterday evening of Vivliostyle for PDF creation. I’d like to share a few instant thoughts about it:
Vivliostyle doesn’t support any PDF/UA standard, which I will definitely miss.
Further it supports PDF/X-1a, which is better than nothing in the publishing industry, but sadly way behind time.
I haven’t found a way to use color profiles for the PDF/X-1a.
While I can live with 2 and 3 for a while (even though it pains my typesetter’s heart), 1 is really sad to know. You’ve invested quite a bit of work into good accessibility of the OJS frontend, so it would be consistent to make the available PDF galleys PDF/UA-compliant.
Regarding these points I’d like to encourage you to maybe give the opportunity to connect the rich text editor/JATS (etc.) part with an external tool (e.g. with PrinceXML) or with an external API (e.g. with DocRaptor API) to generate PDF galleys externally in an automated way, so OJS users have a joice.
It would also be nice to have a flexible approach to galley creation: e.g. if I can choose to let the production workflow do the JATS and HTML part, but can upload an externally generated PDF galley (instead of using the “inline production” of PDF).
It would also be nice, if EPUB production would be implemented, too. (As far as I understood, this shouldn’t be a problem with Vivliostyle.)
As far as I can see, all of this will not make it into OJS before 3.6, and even this implementation will be challenging. Do you have specific plans according to the timeline of these new features?
Nevertheless I am quite curious to see what the implementation will look like.
Thanks for the feedback! This kind of thing is very valuable, especially at this point in development.
My go-to for information on PDF generation has been https://print-css.rocks/. It has a side-by-side profile of several PrintCSS engines, including PrinceXML and Vivliostyle, and an attempt at engine-neutral markup and CSS for many of the sorts of things we’ll need to do to typeset PDFs using HTML – pagination, columns, footnotes, etc.
We are hoping that the actual choice of engine can be swappable. For those who are able (and need) to use a commercial solution, they can do that; however, for the community at large, we would like to incorporate something more FOSS friendly.
Our main contenders for the community at large were Vivliostyle and Weasyprint. Between these, we are likely going to go ahead with Vivliostyle because of another tricky requirement: we need to avoid complex server-side requirements (which would prevent the majority of our community from running the conversion server-side) or building dependencies on external APIs for transformation (which might not be sustainable).
With Vivliostyle, we are able to have the best of all worlds by running the transformation engine right in the typesetting editor’s web browser, and passing the resulting PDF back to OJS. No exotic server-side requirements, and no third-party API to depend on.
The tl;dr: We’ve heard that PrinceXML is the most capable engine, but it being a commercial product, and requiring server-side tools that most of our community can’t run, mean that we can’t integrate it for the community at large. Vivliostyle will hopefully meet those needs, and those who can run PrinceXML will be able to continue to do so.
Thanks,
Alec Smecher
Public Knowledge Project Team
I know https://print-css.rocks/, which is sadly now in legacy mode. It is/was a very good starting point.
I prefer FOSS for the community, too. And this point
With Vivliostyle, we are able to have the best of all worlds by running the transformation engine right in the typesetting editor’s web browser, and passing the resulting PDF back to OJS.
is also crucial. So choosing Vivliostyle is the best option right now – I can clearly see that.
My comments were intended more as a suggestion as to what improvements might be useful to consider in extending Vivliostyle for a future use in OJS. Accessibility is a particularly important issue (here in Germany, too, there is now a legal basis for accessibility for digital products such as PDFs or EPUBs).
I’m sad to hear that print-css.rocks is not active any more! It is a great resource.
This is outside my area of expertise, but there does appear to be at least partial support for PDF/UA in Vivliostyle (per an issue in their github repo). But accessibility is important – we’re likely to support several engines for generating PDFs from HTML+CSS, and each will have a mixture of compromises on license, features (including accessibility), and deployment requirements.
Thanks,
Alec Smecher
Public Knowledge Project Team