Extract and save references - extra line spacing

nef · November 1, 2018, 10:20am

Hi
If I have to add references to the metadata after publishing the article, I have to use the so called ‘Extract and save references’, but then I have to accept some extra line spacing, an empty line between the references. Can I in any way prevent that?
Extract

nef · December 5, 2018, 3:27pm

I’ve observed another error when using ‘Extracted References’. When I make the following extraction the number (21) in the very first reference is missing in the extracted references:

number%20missing

ctgraham · December 5, 2018, 6:50pm

The problem is here:

github.com

pkp/pkp-lib/blob/23ba62f8c0cc62d16c513810c1a1ef19705acf4e/classes/citation/CitationListTokenizerFilter.inc.php#L52


		// 2) Remove trailing/leading line breaks.
		$input = trim($input, "\n");
		// 3) Break up at line endings.
		if (empty($input)) {
			$citations = array();
		} else {
			$citations = explode("\n", $input);
		}
		// 4) Remove numbers from the beginning of each citation.
		foreach($citations as $index => $citation) {
			$citations[$index] = PKPString::regexp_replace('/^\s*[\[#]?[0-9]+[.)\]]?\s*/', '', $citation);
		}

		return $citations;
	}
}

We’re trying to strip out the numbering from numbered citations, but there isn’t clear definition of when the numbering ends. As the code stands, the number could optionally end with a period, close parenthesis, closing square brace, plus 0 or more spaces. The “21” in “21st” matches this and is stripped.

Our regex here needs to consider this.

In your original question, is the “empty line” between references present in the actual data (as a blank reference), or is this just a matter spacing within the display?

nef · December 6, 2018, 11:44am

Thank you.
And to your last question, the “empty line” (as a blank reference) shows up after publishing. Below you see a before and after using the extract. That is the way it is presented to the reader
Before:
before%20extracting1

After:
after%20extracting1

ctgraham · December 6, 2018, 12:23pm

If you use your browser’s Web Inspector tool to examine the page source, you’ll find that these references are contained within p tags. You browser is applying default paragraph styling to the p tag.

<div class="item references">
  <h3 class="label">References</h3>
  <div class="value">
    <p>Reference, Test: One. </p>
    <p>Reference, Test: Two. </p>																		 
 </div>
</div>

To alter the display, add some custom CSS to override the margin property/properties which are adding this space, for example:

div.references div.value p {
    margin-block-start: 0;
    margin-block-end: 0;
}

nef · December 12, 2018, 9:04am

Why don’t you use the tag instead?

ctgraham · December 12, 2018, 1:20pm

The   tag represents a structural markup tag rather than a semantic markup tag. The use of classing and ids on semantically relevant tags is preferred because it allows increased flexibility for styling (via CSS) and increased machine readability.

An alternate question could be: why were  tags used instead of <div> tags, or (even better) <li> tags? These references do not represent paragraph content in my mind.

nef · December 12, 2018, 3:05pm

I would really prefer a tag instead of the tag. Then you can maintain the original formatting (see the above - before/after)

ctgraham · December 12, 2018, 4:54pm

The formatting is completely arbitrary; it is entirely based on your selected CSS.

That said, I think I see what you were getting at with respect to the   tag. If references are parsed, they are displayed in  tags. If the references are not parsed, they are reformatted with   tags separating each.

github.com

pkp/ojs/blob/dd1d5fa17c709b3d140fc9d43af06c4ea6897a9c/templates/frontend/objects/article_details.tpl#L199-L205


						{if $parsedCitations->getCount()}
							{iterate from=parsedCitations item=parsedCitation}
								<p>{$parsedCitation->getCitationWithLinks()|strip_unsafe_html} {call_hook name="Templates::Article::Details::Reference" citation=$parsedCitation}</p>
							{/iterate}
						{elseif $article->getCitations()}
							{$article->getCitations()|nl2br}
						{/if}

I think a better structure would be to always output them as an unordered list, classed to whether they are parsed or not.

The formatting would still be up to your CSS declarations, but the structure would be internally consistent and meaningful.

nef · December 17, 2019, 10:18am

Hi @ctgraham
Here is an example of things going wrong. The references are broken down incorrectly due to the extra line spacing:
https://tidsskrift.dk/mediekultur/article/view/106133
Best
Niels Erik

ctgraham · December 17, 2019, 2:27pm

This suggests that when the references were pasted into the textbox, there were line breaks within the references themselves. This tool requires “each reference on a new line so that it can be extracted”.

I remember making an attempt to guess whether the references were separated by on newline, or by two newlines, in order to accommodate references which might have internal linebreaks, but inconsistent data led me to ultimately fall back on just enforcing the instructions. The author or editor will need to cleanup the reference list to ensure it is each reference on a new line.

StephenMAD · June 26, 2020, 12:53pm

When I parsed the references of an article and then I look at the code of the article on my browser, all the “br” are changed in “li”. It is certainly more elegant but it generates problems for some databases that want to extract the metadata of the articles.

Where can I find the code to be retouched so that the br is not changed?

In general, what can I do to avoid this change generated by the “extract and save references” button?

asmecher · June 26, 2020, 2:53pm

Hi @StephenMAD,

It’s best not to post the same content multiple times, as it clutters the forum. Someone will respond to your other post: Change format of references - OAI - OJS 3

Thanks,
Alec Smecher
Public Knowledge Project Team

StephenMAD · June 26, 2020, 2:55pm

Sure. Sorry for that and thank you for your advice.