DOCX to JATS XML converter

What operating system you are using?

I use the windows 10.

E.g.: https://www.howtogeek.com/235101/10-ways-to-open-the-command-prompt-in-windows-10/

Iā€™ve made the first non-production release for DOCX to JATS XML Converter Plugin for OJS 3.1+: Release docxConverter-beta1 Ā· Vitaliy-1/docxConverter Ā· GitHub

It produces the output suitable for Texture Plugin.

For those who are interested in testing, there are some instructions and examples here: GitHub - Vitaliy-1/docxConverter: Plugin for OJS 3 that parses DOCX and converts it to JATS XML format

Requirements:

  • PHP 7.2+
  • php-xml (usually installed by default).
  • php-zip (usually installed by default).
2 Likes

Thanks a lot @Vitaliy
We are pleased to test it

Hi @Vitaliy
I have tried it with the latest oldGregg. It is great, although some tables need a bit work with Texture. Thanks so much for your work.

I have one issue;
It seems that the citation [1] is not parsed and linked to the reference. Is it not included in this alpha release? or How could the link be made in XML file?

You mean at the front-end? Yeah, I need to fix it. Itā€™s parsed but not linked.

Yupā€¦ Waiting for the fix
Is for the next release?

Yes, but before that I want to make a production release for DOCX Converter Plugin.

1 Like

Hi @Vitaliy
Is there any manual way to link the citation and reference? while waiting for the fix in the next release. Soon, we are going to upgrade our journal to the latest Ojs and oldGregg and publish a new issue

Hi @Vitaliy,
I installed the plugin without any problems but once I click on the link nothing happens, I forgot to activate something?
Thanks!

Bye
Tiziano

It requires modification of the JATS Parser code related to the library. Most probably here: https://github.com/Vitaliy-1/JATSParser/blob/master/src/JATSParser/HTML/Text.php#L42
I think I missed something there.

1 Like

Hi @Tiziano,

So, you are pressing the Convert to JATS XML button but nothing happens, right?
If so, there definitely should be a fatal error in php logs that indicated the reason. Let me know if you find something.

Hi @Vitaliy, great work!
Okay then the reason apparently was because I tested it on an OJS version 3.1.1.4. I tried to install it on a 3.1.2 version and it works, it processes the XML file (but only if it has the extension .DOCX, instead the .DOC does not appear the button).
Iā€™ll tell you two things, the first is that on Safari Browser the Edit Texture doesnā€™t work, but it works on Firefox (I didnā€™t check on Crome). Once I make corrections on Edit Texture, does the XML file update automatically on OJS?
The second is that with the eLens View plugin the XML file that processes the DOCX CONVERTER does not work.
I hope that these observations will help you, to have a good plugin that in our case would be great.

Bye
Tiziano

DOC and DOCX are quite different formats. Itā€™s possible to convert DOC to DOCX with PHP but it requires a heavy 3rd party library. Iā€™m not sure if it is needed because the same can be accomplished almost with any text editor (like MS Word or LibreOffice).

Can you specify the version of Safari Browser and its version? You can open an issue on the pluginā€™s page with the error from the browserā€™s console: Issues Ā· pkp/texture Ā· GitHub but it may be a specific scope of the Texture itself, rather than pluginā€™s.

Yes. You need to press a save button (upper-right corner as I remember)

My possibilities to fix Lens Viewer plugin are quite limited. What Iā€™m certainly planning is to make the output compatible with JATS Parser Plugin.

Sure, no doubt! My first aim now is to make sure that it is compatible with DOCX files created from different sources. Itā€™s problematic without tests from users.

Dear @Vitaliy

I found some cases:

First, it seems that the table is not displayed properly in article details. I tried your sample xml docxToJats/test_jats.xml at 78b9c0fa46fa77de427b63cf93c8254b1acbd2c5 Ā· Vitaliy-1/docxToJats Ā· GitHub here https://dryam.website/index.php/jer/article/view/7. It is Ojs 3.1.2 with the latest oldGregg.

Second, is the citation e.g., [1,2] parsed in the sample? Or it is just the problem of code as you explained

Third, once I converted a Docx with some tables that contain over 100 words (It sounds strange table with a such number of words, but we have some). The result is only a half or less texts parsed then tried editing via Texture plugin (adding the missing texts), they were not saved. So, I added manually in the xml file. In this case, is there any limitation of words in a table to be parsed properly?

As Iā€™m trying to remember, I probably fixed it for the JATS Parser library but havenā€™t updated for Old Gregg theme. Thanks, Iā€™ll check.

For DOCX Converter Plugin I havenā€™t added support for citations. Iā€™m thinking about possible approaches. In DOCX citations and references havenā€™t special markup unless specific MS Word or LibreOffice citation tools are used. I suppose the best option would be to make integration for Zotero that can be used as a plugin in those text editors.
Before that they can be added manually with Texture plugin.

Hmm, tables should be converted normally. Can you send me a problematic DOCX file? But I can take a look only in 1-2 weeks.

1 Like

I think it is a good idea if possible.

So far I know, it works best with Chrome

How is this? We plan to upgrade soon