Yes, it should be compatible with Google Docs and LibreOffice.
I submited a libreoffice (odt) and the “Convert to JATS XML” button is not shown.
About google Docs, OJS3 only let me submit a file… not an url.
I’m testing docxConverter 0.5.1.0
If you can convert from all those sources, probably docxConverter is not the best name.
What about jatsConverter ?
I primary check the functionality with LibreOffice documents, thus those documents should have less bugs.
Waiting for your answer to discover how to test this… In confidence, my final goal is finding the way to cover the whole workflow only with free software.
Can you send me MS Word document where headers aren’t currently recognized? Let me know if you need my email.
I love to but discourse only let us upload images, please mail me to marc.bria(spiral)gmail.com.
I will send you the ODT and DOCX that I use for testing.
Citations are in APA inserted via zotero plugin (odt with RefMarks, docx without).
I will also test Vancouver (that it’s one of the styles that I read it’s implemented).
I hope those documents will be good enough for testing (covers common needs), but please, let me know if you want me to include something else or modify the files yourself (metadata in the doc? header/footers? Different Citation format? TOC?..) and send them back to me.
As for figures, citation and formulas they indeed aren’t supported yet.
Yes, I noticed and you explained it one or twice in the forum. Take it easy.
extracting images should be more or less easy tasks as they are stored inside DOCX archive. So, they can be not only parsed but also uploaded to the system and attached to the galley file.
When you add a figure with Texture plugin the figure is attached to the xml document as a “Dependent file”. Both documents (odt and docx) are zip files with pictures inside. So I suspect the plan is parsing the source, build the right JATS tags and unzip and attach as “dependent” file the image, isn’t it?
I suggest focus on inserted images (that is the usual and also the easiest) and go with external-linked images in future.
BTW… as a feature request (for far, far future) what about including DAR format in your converter source list? I mean, at the end is JATS with files and a manifest and it will make OJS compatible with texture-desktop (that in some contexts could be more comfortable to edit than texture-web).
Parsing citation is a bit problematic as they can be in various formats.
“A bit” it’s you been ironic, isn’t it?
IMHO, this is the most difficult task you will have in this project.
I plan to support Zotero, probably native MS Word and LibreOffice citations.
If you want a second opinion here… I think is better covering one of them and do it really well (covering all citations formats) than go with all three at the same time (and cover partially).
Probably, my bet would be for zotero because it’s free soft, multiplatform and a “you must have” tool for authors, but I call one of my editors and he said that most of the authors are be more familiar with Word, so…
It would need regex patterns for each citation style and for each reference type (book, journal article, chapter, thesis, etc,).
Time to quote Zawinski?
I have doubts if we need to do this with the docxConverter. I mean, if authors deliver references in something more or less standard (bibTex? JSON?) probably we can import later with texture.
Even better: If authors submit their docx and bibTex, will be difficult for docxConveter to take both and do the job?
Trying to understand all the citation formats it’s a crazy, so at the end I’m suggesting relay this task on citation tools… I don’t know if I explained myself.
I haven’t seen yet how formulas are implemented in OOXML compared to JATS. I need to dive into guidelines but as far as I know they are highly structured, making the support for them quite possible.
And looks like the JATS standard is not clear about this…
Texture is reading a latex variant, while JATS4R is working over MathLM.
I think we need to clarify the direction here before start coding.
BTW, DOCX Converter uses own parsing mechanisms, it doesn’t rely on any 3rd party library, like TEIC Stylesheets that are used by meTypeset or OxGarage. Thus, I hope, it doesn’t inherit their problems The drawback is that it takes more type for developing.
All the projects you mention are great but you did an impressive job Vitaliy.
Thanks again.
PD: Testing table updated. It’s a wiki page, so anybody can join the testing or fix if something is wrong.