DOCX to JATS XML converter


#25

Hi @Vitaliy in the vesion DOCX2JATS 1.0.3 the file 1.jar is not found… When the folder is decompressed the file is not there. Is this an error or there is another way to execute this command java -jar 1.jar C:\jats\article1.docx article1.xml

Thaks!!


#26

Hi @David_Alarcon_davidy,
Hmm, that’s odd. Do you confirm that inside the archive docx2jats-1.0.3.zip there is no such file as 1.jar?

The archive is here


#27

Can we justify in XML?


#28

XML (including HTML as its type) is just a structure: nodes (elements) and data (text).


#29

Any way we can justify it? I am trying since last 2 days., but yet not able to do it completely


#30

This goes to styling. You can modify css inside lens plugin (lens.css file or sort of) to accomplish this


#32

Thanks @Vitaliy… Sorry, I used this https://github.com/Vitaliy-1/DOCX2JATS and should have used this https://github.com/Vitaliy-1/DOCX2JATS/releases

Thanks!


#33

Hi! @Vitaliy

What format should authors’ names and data have to be recognized in the docx to Jats XML conversion process?

Thanks for Docx to Jats XML!!


#34

DOCX document is not containing meta-data (actually it can contain author name and article title, if author explicitly point them in Word in document settings, but non of our authors do so). That’s why all meta-data can be added only manually. But that’s the matter of several minutes.

As for article content, you can look at the our articles examples inside project’s root folder on github. Simply download article1.docx, article2.docx and article_english.docx from here: https://github.com/Vitaliy-1/DOCX2JATS. Click on the file -> then click on the download button. Pay attention at:

  • how sections (including reference section) and subsections title are styled (title of 1st and 2st rank -> you can point them in the main page of Microsoft Word)

  • how links to references are pointed it text (in square brackets with numbers), links to tables and links to figures.

  • how tables are labeled (title and caption).

  • how references are pointed inside reference section. The program will only parse references in Vancouver or AMA style (they are identical). It supports journals, books, chapters and conferences.

Because DOCX (OOXML) format is not well structured, to make good structured JATS XML the program mainly parses the text and symbols inside DOCX file. For example references are parsed with regular expressions. It looks at the dots, comas etc inside the reference. If reference has a dot in a not appropriate place, for example additional dots after author initials - the parsing will fail.

The meta-data can be included manually in appropriate nodes inside formed JATS XML.

In the not near future I am planning to rewrite parser. For now it uses TEIC Stylesheets, which are not ideal. In the future it will use docx4j Java library instead.


#35

@anupent How were you able to get DOI and Journal name on the XML file? Could you share your sample file here for the code?


#39

Lol yea. Dont worry.Take your time…


#41

You could even send me the code in person. But if you share here, it will be helpful to every one


#42

Dear @varshilmehta

I write the front section as below:

 <front>
	<journal-meta>
		<journal-id journal-id-type="publisher-id">LMC</journal-id>
		<journal-title-group>
			<journal-title>Journal of Lumbini Medical College</journal-title></journal-title-group>
		<issn pub-type="epub">2542-2618</issn>
		<issn pub-type="ppub">2392-4632</issn>
		<publisher>
			<publisher-name>Lumbini Medical College</publisher-name></publisher>
	</journal-meta>
  
  <article-meta>
		<article-id pub-id-type="doi">https://doi.org/10.22502/jlmc.v5i1.112</article-id>
<!--	<article-id pub-id-type="publisher-id"></article-id> -->
		<article-categories>
			<subj-group subj-group-type="heading">
				<subject>Original Research Article</subject></subj-group>
		</article-categories>
		<title-group>
			<article-title>Single dose Intraoperative Antibiotics versus Postoperative Antibiotics for Patient Undergoing Laparoscopic Cholecystectomy for Symptomatic Cholelithiasis: A Randomized Clinical Trial</article-title></title-group>
		<contrib-group>
			<contrib contrib-type="author" id="" corresp="yes">
<!--				<contrib-id contrib-id-type="orcid"></contrib-id>	-->
				<name>
					<surname>Thapa</surname>
					<given-names>Sagun Bahadur</given-names></name>
				
				<xref ref-type="aff" rid="aff1"> <sup>1</sup></xref>
				<xref ref-type="corresp" rid="cor1"> <sup>*</sup></xref>
			</contrib>
	
			<contrib contrib-type="author" id="">
<!--				<contrib-id contrib-id-type="orcid"></contrib-id>	-->
				<name>
					<surname>Kher</surname>
					<given-names>Yeshwant Ramakrishna</given-names></name>
				
				<xref ref-type="aff" rid="aff2">
					<sup>2</sup></xref>
			</contrib>
			
			<contrib contrib-type="author" id="">
<!--				<contrib-id contrib-id-type="orcid"></contrib-id>	-->
				<name>
					<surname>Tambay</surname>
					<given-names>Yashwant Gajanan</given-names></name>
				
				<xref ref-type="aff" rid="aff3">
					<sup>3</sup></xref>
			</contrib>
	
		</contrib-group>
		<corresp id="cor1">
			<sup>*</sup> E-mail:  <email>tsagun.nams75@gmail.com</email>
		</corresp>
		<aff id="aff1">
			<sup>1</sup>
			<country>
				Lecturer, Department of General Surgery, Lumbini Medical College, Palpa, Nepal.
<!--				<ext-link ext-link-type="domain-name">http://lmc.edu.np</ext-link>	-->
			</country>
		</aff>
		<aff id="aff2">
			<sup>2</sup>
			<country>
				Professor, Department of General Surgery, Lumbini Medical College, Palpa, Nepal.
<!--				<ext-link ext-link-type="domain-name">http://lmc.edu.np</ext-link>	-->
			</country>
		</aff>
		<aff id="aff3">
			<sup>3</sup>
			<country>
				Professor and Head, Department of General Surgery, Lumbini Medical College, Palpa, Nepal.
<!--				<ext-link ext-link-type="domain-name">http://lmc.edu.np</ext-link>	-->
			</country>
		</aff>
		<pub-date pub-type="epub">
		   <day>10</day>
		   <month>4</month>
		   <year>2017</year>
		</pub-date>
		<pub-date pub-type="collection">
		   <month>6</month>
		   <year>2017</year>
		</pub-date>
		<volume>5</volume>
		<issue>1</issue>
	<!--	<history>
			<date date-type="received">
				<day></day>
				<month></month>
				<year></year></date>
			<date date-type="accepted">
				<day></day>
				<month></month>
				<year></year></date>
		</history>
	-->
		<permissions>
			<copyright-year>2017</copyright-year>
			<copyright-holder>&#xa9; Sagun Bahadur Thapa, Yeshwant Ramakrishna Kher, Yashwant Gajanan Tambay. </copyright-holder>
			<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
				<license-p>This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p></license>
		</permissions>
		<abstract>
			<p><bold>Introduction: </bold> Surgical site infection is a common complication shown in literature following cholecystectomies. Smaller incision and use of trocars in laparoscopic cholecystectomy lessen the contamination resulting in less chances of surgical site infection. However, in fear of postoperative infection, many opt for the prolonged postoperative use of antibiotic and there is growing consensus against it. Antibiotics not only increases the cost and hospital stay duration but it aids in emergence of multidrug resistance. Because of the controversies, we conducted this clinical trial to see whether a single prophylactic dose of antibiotic at the time of induction of anesthesia for laparoscopic cholecystectomy was equally effective in controlling post-operative infection as multi-dose antibiotics during and post-operative period.</p>
			<p><bold>Methods: </bold>The study was conducted at the department of general surgery, Lumbini Medical College Teaching Hospital, from November 2015 to October 2016. All cases with symptomatic cholelithiasis subjected for laparoscopic cholecystectomy were enrolled. Patients were randomized into two groups; Group SD received single dose of an intravenous dose of amikacin 500 mg, at induction of anesthesia and Group MD received multiple intravenous dose of amikacin, during and postoperatively for two days. Complications, hospital stay, and treatment cost in two groups were compared and analyzed. </p>
			<p><bold>Results: </bold> There were a total of 240 patients in the study, 118 in Group SD and 122 in Group MD. Post-operative infection rate was 4.2% (n= 5, N=118) in Group SD and 3.3% (n=4, N=122) in Group MD; the difference was not significant (p=0.75). Hospital stay was prolonged and cost was higher significantly in Group MD. </p>
			<p><bold>Conclusion: </bold> Single dose of prophylactic antibiotic, administered at induction of anesthesia, is equally effective as multiple doses of post surgical antibiotics to prevent post-operative infection in patients undergoing elective laparoscopic cholecystectomy for uncomplicated cholelithiasis.</p>
			
			<p><bold>Keywords: </bold> antibiotics • laparoscopic cholecystectomy • length of hospital stay • prophylactic • surgical wound infection</p>
		</abstract>
		<counts>
			<ref-count count="15" />
			<page-count count="5 approx" />
		</counts>
	</article-meta>
</front>

#43

Awesome work. Thanks a lot.:slight_smile:


#45

Hi! @Vitaliy his work is really useful. Thank you so much for everything you do.

I would like to ask a few questions. My English is very poor and I apologize, but I do the best I can.

1.- I want to be able to add other types of references other than Supports journal articles, books, chapters and conferences. Could you give me some indication of how I can add them:

For example:

A) Both personal authors and organization as author.
B) No author given.
C) Organization (s) as author.
D) Scientific or technical report.
E) etc.
F) etc.

2.- The journals in which I am collaborating work with the APA standard.
I am currently changing the APA references to Vancouver but it is a very slow job.
Could you tell me which files to modify to get the APA standard working?
My knowledge about programming is very basic but I need to make this work for me.

Any guidance will be of great help to me.


#46

First of all you must inderstand how it all works.

Because there wasn’t much time for the development I used TEIC stylesheets for several transformations. The result document is in JATS format, but each reference item is a simple string inside list-item Node.
This string then is parsed with regex to find similar predifined patterns. Each regex expression represents bibliographic type (journal, book, chapter and conference). The programm picks the best match from them and applies it to retrieve the data. If it is alright to you to use regular expressions and Java XPath and DOM libraries I could explain further.

By the way authors as persons and as organization are both supported. Also in the future I am planning to complitely rewrite this parser. It will use Java docx4j library and will support zotero refs, native Word bibliography data and use some trics from Machine Learning to manage bad formatted references. That’s of course if our journal project will be succesfull.


#47

Hi @Vitaliy
I can’t run the jar file with cmd or java program

image

Jats


#48

Dear @kawahyu,

It seems that your Command Promt is not opened in the same directory as you have the jar file.
Try to browse to c:\users\Ka\documents\jats in your command prompt and try again.
OR, you can write the full path of 1.jar file in your command prompt which i find a bit difficult to type.

Regards,
@anupent


#49

Thanks @anupent
I have tried but it doesn’t work

image


#50

Edited: see bottom

Sorry I could not convey properly what i Mean.
Can you open your command prompt in your jats directory like I have done my as below?
Here my 1.jar file is in d:\docs2jats folder

image

Hope, it helps you.
@anupent

or simply in your case, after you open command prompt you type
cd documents
cd jats
then you write you command i.e. java -jar 1.jar document.docx document.xml