JATS Parser v2.0: JATS XML to HTML and PDF conversion

JATS Parser plugin version 2.0.1 is released. It’s a compatibility release for OJS 3.1.2: https://github.com/Vitaliy-1/JATSParserPlugin/releases/tag/2.0.1

Hi @GabeLon

I’ve fixed this problem for a new release. But it’s only for OJS 3.1.2.

Hi, but the pdf conversion in not yet working. It shows a HTTP ERROR 500. On health science theme and 3.1.2 OJS

Error log:

> [24-Mar-2019 06:38:41 UTC] PHP Warning:  Declaration of SubmissionKeywordEntryDAO::getByControlledVocabId($controlledVocabId, $rangeInfo = NULL) should be compatible with ControlledVocabEntryDAO::getByControlledVocabId($controlledVocabId, $rangeInfo = NULL, $filter = NULL) in /home/u796555399/public_html/journals/jcrs/lib/pkp/classes/submission/SubmissionKeywordEntryDAO.inc.php on line 20
> [24-Mar-2019 06:38:41 UTC] PHP Warning:  require_once(/home/u796555399/public_html/journals/jcrs/plugins/generic/jatsParser/JATSParser/src/JATSParser/PDF/../../../vendor/tecnickcom/tcpdf/tcpdf.php): failed to open stream: No such file or directory in /home/u796555399/public_html/journals/jcrs/plugins/generic/jatsParser/JATSParser/src/JATSParser/PDF/TCPDFDocument.php on line 5
> [24-Mar-2019 06:38:41 UTC] PHP Fatal error:  require_once(): Failed opening required '/home/u796555399/public_html/journals/jcrs/plugins/generic/jatsParser/JATSParser/src/JATSParser/PDF/../../../vendor/tecnickcom/tcpdf/tcpdf.php' (include_path='.:/home/u796555399/public_html/journals/jcrs/classes:/home/u796555399/public_html/journals/jcrs/pages:/home/u796555399/public_html/journals/jcrs/lib/pkp:/home/u796555399/public_html/journals/jcrs/lib/pkp/classes:/home/u796555399/public_html/journals/jcrs/lib/pkp/pages:/home/u796555399/public_html/journals/jcrs/lib/pkp/lib/adodb:.:/opt/alt/php71/usr/share/pear') in /home/u796555399/public_html/journals/jcrs/plugins/generic/jatsParser/JATSParser/src/JATSParser/PDF/TCPDFDocument.php on line 5
1 Like

Hi @Vitaliy,
I installed the parser but then I do not open the plugin window anymore, use OJS 3.1.1.4. I always see the wheel that turns loading.

53

How can fix?

Thanks
Best!

Tiziano

Hi @Tiziano,

What release are you using? Keep in mind that the latest release (or master branch) is compatible only with OJS 3.1.2.

Hi @Vitaliy,
i downloaded this version:

44

and my OJS version is 3.1.1.4.

Hmm, can you review the error log for errors or fatal errors?

Hi @varshilmehta,

Sorry for the delay. This problem was simply related to packing the release. Can you test this new one? https://github.com/Vitaliy-1/JATSParserPlugin/releases/tag/2.0.1.1

1 Like

Hi @Vitaliy, works great en 3.1.2!
Awesome work.
Many thanks!!

1 Like

It works now. Thanks a ton.

Hi @Vitaliy, now my JATS Parser works with my XML, at the beginning it didn’t read images and tables, but if I move them from the bottom of the XML file to the middle of the code, where they are mentioned, they are displayed.

I would report two things: the first is that the link <xref ref-type="fig" rid="fig001">Figure 1</xref> does not work and I can not understand why, the second is that when I click on the PDF to convert it, I return this error:

TCPDF ERROR: [Image] Unable to get the size of the image: https://audiologyresearch.org/index.php/audio/article/download/214/252/2042

is there any way to correct these errors?
Thanks!

Bye
Tiziano

Hi @Tiziano,

Not sure why the link isn’t working. Looks correct to me. Regarding the image, it’s possible that 1) TCPDF library doesn’t support the image’s format for some reason or is failing to find/upload it. Can you send me this XML and images?

Hi @Vitaliy,
thanks for your help, now i send you XML and images.

Bye!
Tiziano

Hi @Tiziano,

Yes, I can confirm that links for figures and tables don’t work and I’ll make a fix for that.

Regarding conversion to PDF, all works fine on my OJS instance with your example. In my opinion, the error means from TCPDF means that there is a problem with accessing that image, for some reason; or the PHP library that helps to perform this action (detemine the image size) is missing. The latter is unlikely because TCPDF relies only on standard libraries and the former is unlikely because the path is correct and is accessible. So I have no idea.
Actually, check this answer on StackOverflow: https://stackoverflow.com/a/28007106/6711224

Hi @Vitaliy,
I update you on the PDF problem, I installed the latest version of OldGreg template with the JATSParser and PDF Parsing, great thing this for us! It works, but I’ll tell you that, the link <xref ref-type="fig" rid="fig001">Figure 1</xref> does not work, and now process the PDF correctly and view the table, but not the figures and as in the display of HTML do not work links. I hope I’ve made myself clear. :smile:

Best regards
Bye
Tiziano

Hi, @Vitaliy I am using OldGregg theme in our journal. It’s fantastic!
However, there are a couple of things we would like to customize and I will truly appreciate your help on that:

  1. First thing is that, right now we are no needing the JatsParser functionality since we are uploading our own pdf version of the articles, so… we would like to hide or disable the attached icons that are automatically displayed at the left margin of the HTML version that OldGregg renders from JATS (see attached print screen). Can you guide me on what archive I should change to accomplish that?

  2. Once our PDF is loaded on the PDF viewer, for some reason the bottom of the screen remains blank (white). Although it is possible to scroll the pdf to see the bottom of the page, this remains odd. Do you think this error is related to OldGregg, or have any clue on how we can fix it?

Thanks in advance!

Hi @GabeLon,

Yes, PDF galley page isn’t styled because the idea behind the theme is to generate PDF from JATS XML on the fly.

Anyway, I’m working on major improvements for the Old Gregg theme, which include re-styling of all pages, making the look more modern and clean, and, as always, drastically different from any existing theme. Most probably beta-release for v.2 will be ready at the end of the summer and it will contain also styling for PDF galley page. Sorry that I can’t implement the fix earlier than next release.

The idea behind the 2nd release of the theme will remain the same - it would be mostly for journals that are oriented of JATS XML workflow, want to publish the full text of the article on an article landing page and are more centered on a continuous publishing model.

I’ll soon make an announcement to show the sample of pages, how they look in the new version of the theme, suggestions are welcome.

Meanwhile, mockup (already implemented) for one of the most simple pages:

1 Like

Hi @Vitaliy

First thanks a lot for your work. It’s awesome.

In a former thread we talk about styling PDF. I think without styling, most of the journals won’t like to enable PDF generation because they need their PDFs be very personalized. At the end, their final expectation with JATS is an authomatic PDF generation.

I still need to read a lot about it to full understand the options we have but my preliminary thoughts are:

  • TCPDF is very limitated and all styling need to be done over code. I think it’s not the way.
  • TCPDF + smarty templates + CSS: It will improve things but you won’t still have good control over the PDF generation (what to do with orphans, tables if pagination break them…).
  • XML-FO: I didn’t work over this but it’s presented as the right tool to do the job. Unsure about how easy will be for journals to create their templates and styles, and we need an easy tool.

I found those two links useful to start to understand the whole picture:

But, in confidence, I’m kind of lost between names like Saxon, Calabash, Xalan… and even more confused about the licenses.

Love to hear your thoughts here or if you have a clear development line.

Cheers,
m.

Hi @marc,

When choosing conversion library, I’m looking at 1) It’s license, 2) dependencies, and 3) the end result.
TCPDF is good because it’s lightweight, fast (very fast), doesn’t require any external tools and will run on most shared hosting. Moreover, it does much of work automatically.

I also considered LaTeX but it requires a compiler to be installed on the server.

Regarding styling, I don’t know tools that allow extensive + simple styling. I’m thinking to make 2-3 templates for TCPDF output, additionally, they can be styled (limitedly) with CSS.

If you find a better tool, preferably PHP or JS-based, let me know!

I understand. Really good arguments, that I also full share.

This link compare different PHP-PDF generation libraries and also concludes with TCPDF as the right choice:


I will try to do some research during summer to see if I find something useful.

In confidence, I was thinking the right way was XML-FO but I recently read XML-FO is almost dead so looks like smarty+CSS+TCPDF could be the approach.

Is this your line of working?

What do yo think of Snappy?
Yes, includes wkhtmltopdf as a dependence, but it’s a cost that worth to pay or not?

A wide first search drive me to those couple of links with nice licenses but python as a requirement (I also don’t like to break our stack but sample files are really promising):

Again, love to hear your opinions.

Keep in touch,
m.