DOAJ plugin and recognise dash

Hi!
DOAJ export plugin cannot transfer information about the start and end page of the article in xml file, if the left and right contact dash is between the numbers, eg. 10–23. If there is a hyphen between the numbers, eg. 10-23, the data of the start and end pages of the article are appears in the xml file.
The left and right contact dash is grammatically correct in this case, so I was wondering how to repair a plugin to recognize numbers with dash between page numbers? Which file should be repaired …?

Regards,
Jože

1 Like

Hi @aas,

Have a look at plugins/importexport/doaj/classes/DOAJExportDom.inc.php in the generateArticleDom function for a call to getPages.

Regards,
Alec Smecher
Public Knowledge Project Team

Hi, @asmecher,

Thanks,

Jože

Hi @asmecher again,

I found a place in the file, replace all short dashes in the long one, and run the plugin, but page number does not appear in xml file. Maybe can you suggest what specifically should be amended to take effect?

This is the original code:

/** — FirstPage / LastPage (from PubMed plugin)—
* there is some ambiguity for online journals as to what
* “page numbers” are; for example, some journals (eg. JMIR)
* use the “e-location ID” as the “page numbers” in PubMed
/
$pages = $article->getPages();
if (preg_match("/([0-9]+)\s
-\s*([0-9]+)/i", $pages, $matches)) {
// simple pagination (eg. “pp. 3-8”)
XMLCustomWriter::createChildWithText($doc, $root, ‘startPage’, $matches[1]);
XMLCustomWriter::createChildWithText($doc, $root, ‘endPage’, $matches[2]);
} elseif (preg_match("/(e[0-9]+)/i", $pages, $matches)) {
// elocation-id (eg. “e12”)
XMLCustomWriter::createChildWithText($doc, $root, ‘startPage’, $matches[1]);
XMLCustomWriter::createChildWithText($doc, $root, ‘endPage’, $matches[1]);
}

Thanks in advance,
Jože

@aas, are you familiar with regular expressions? The line in question is a regular expression, and the character ranges will require the dash to work correctly.

You want to create a new character range or conditional capture group to replace just the dash between those two numeric ranges.

@asmecher, any reason not to just pull this into core?

Hi @ctgraham,

Yes, I think this would be suitable for merging.

Regards,
Alec Smecher
Public Knowledge Project Team

Hi @ctgraham,

Thanks for the advice. Unfortunately, I am not familiar with regular expressions, but with help on http://www.regexr.com/ I found out which hyphen I have to replace with long dash, that will do the effect.
I changed s*- with s*– and it works.

Thanks again,
Jože

Just changing s*- to s*– will mean you won’t be able to use the dash in the entry. Using another group with both s*[–-] will allow for either.

@ctgraham
Thanks for the warning and advice. I realize that, but because I already entered endash in all the old articles, this will not be a problem. I created xml and all start and end pages downloaded. In future I will also take into account that I will write endash between start and end pages, as properly by APA style :slightly_smiling: .

Thanks,
Jože