How does OJS treats Title and Subtitle when exporting data?

In OJS 3.3.0.3, we now have the subtitle field, my question then is:
Does Title and Subtitle get concatenated in every instance when the export target (OAI-PMH, DOAJ, etc?) only has a Title field is present? Or when exporting to something that just have Title is the Subtitle ignored?

This is important because if we have a title like “Something something: Questions regarding something”, we have to know if we must use that semi-colon or not, or how we should handle that (also taking into account the AACR2).

Hi @luizborges,

What specific version of OJS you’re using? It may vary based on the export mechanism used (e.g. OAI-PMH, QuickSubmit, DOAJ, etc.).

-Roger
PKP Team

My mistake, I forgot to mention my full version, OJS 3.3.0.3.
But I don’t think this affect how subtitles are handled, right? Since they are new to OJS3.

Hi @luizborges,

I was only able to test the DOAJ export so far - I did it in OJS 3.3 - the reason why I asked is so I could make sure that we were on the same page when we tested this - there could very well be differences in how this behaves between versions. And I what I found is that it didn’t export the subtitle field at all. Here’s an example of the DOAJ XML export:

<records xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://doaj.org/static/doaj/doajArticles.xsd">
  <record>
    <language>eng</language>
    <publisher>PKP Publishing Services</publisher>
    <journalTitle>OJS3 Testdrive Journal</journalTitle>
    <eissn>2049-3630</eissn>
    <publicationDate>2021-03-18</publicationDate>
    <volume>1</volume>
    <issue>3</issue>
    <doi>10.1234/td.v1i3.722</doi>
    <publisherRecordId>1039</publisherRecordId>
    <title language="eng">Effectiveness of influenza vaccination for healthy adults</title>
    <authors>
      <author>
        <name>Vitaliy Bezsheiko</name>
        <affiliationId>0</affiliationId>
      </author>
    </authors>
    <affiliationsList>
      <affiliationName affiliationId="0">Bogomolets National Medical University</affiliationName>
    </affiliationsList>
    <abstract language="eng">
Vaccines for a long time have been used to prevent influenza. They are often recommended for healthy adults before an influenza season, despite the low risk of complications due to this infection in them. The article reviews the rationale for such recommendations.
</abstract>
    <fullTextUrl format="html">https://demo.publicknowledgeproject.org/ojs3/testdrive/index.php/testdrive-journal/article/view/722</fullTextUrl>
    <keywords language="eng">
      <keyword>influenza</keyword>
      <keyword>vaccine</keyword>
      <keyword>adults</keyword>
      <keyword>complications</keyword>
    </keywords>
  </record>

And here it is showing on our test instance:

Screen Shot 2021-04-25 at 6.47.17 AM

So, not only is it not concatenating - it’s not even being included in the export, so far as I can tell. This might be a bug, or it might be intentional, but I wanted to ask: are there instances where you notice the two fields concatenating in other export utilities? And, if so, could you elaborate further on this?

And, as I’ve noted - I haven’t tested it on other export mechanisms - that would take some time. I will have to follow-up with our Dev Team on this and get back to you.

-Roger
PKP Team

@rcgillis actually we are still considering how we are going to use the subtitle field, and if we will actually use it. We manage journals from linguistics and some other humanities areas, and the subtitle is very important. For now we will keep everything under title since we want all that information to be visible at every endpoint (I know one of our journals use/used DOAJ and a few of them uses the Crossref export for DOIs, also a lot of indexers rely on the OAI for harvesting data…

Hi @luizborges,

Each part of the system that needs to format a title into a 3rd-party (export) format is coded separately, so it might choose to get just the title field, or concatenate title and subtitle, or ideally the 3rd-party format supports separate title and subtitle fields and both can be provided separately.

In the case of the Dublin Core format (used by OAI-PMH), for example, the code calls getFullTitle:

This will include the subtitle.

Regards,
Alec Smecher
Public Knowledge Project Team

Hello @asmecher,
This is also what I wanted to know, how it concatenates the data:

	static function concatTitleFields($fields) {
		// Set the characters that will avoid the use of
		// a semicolon between title and subtitle.
		$avoidColonChars = array('?', '!', '/', '&');

		// if the first field ends in a character in $avoidColonChars,
		// concat with a space, otherwise use a colon.
		// Check for any of these characters in
		// the last position of current full title value.
		if (in_array(substr($fields[0], -1, 1), $avoidColonChars)) {
			$fullTitle = join(' ', $fields);
		} else {
			$fullTitle = join(': ', $fields);
		}

		return $fullTitle;
	}

Also, I would add a a semicolon ’ : ’ to the $avoidColonChars to prevent someone following AACR2 and putting the semicolon there ending up with two semicolons in a row…

Hi @luizborges,

This is somewhat off-topic, but I’m hesitant to extend that implementation further without considering other languages. The current implementation is quite Latin-centric, and assumes a left-to-right language. Maybe there is a 3rd-party implementation or standard out there?

Regards,
Alec Smecher
Public Knowledge Project Team

@asmecher this is the current code called by getFullTitle(), I just suggested that addition of semicolon to $avoidColonChars because otherwise the code would run and add another semicolon to the end, now if a semicolon is detected it is just ignored as are the other punctuation.

When I was a translator in GNOME, sometimes there would be messages like %{title}s: %{subtitle}s with sprintf, with a comment to let translators know what it meant. Good news, this solves the hard-coded Latin-centeredness. Bad news, it gives room for translators to break stuff. One way of mitigating this issue was a hook that ran a lot of checks on commits when they were pushed. But that was ~ 10 years ago. Since then, “damned lies” (their l10n web app) began to be used instead of directly git, so maybe damned lies itself does these checks nowadays.

Hi @leonardof and @luizborges ,

We could perhaps add a new locale key as @leonardof proposes to allow each language to specify how title and subtitle are joined, e.g. for Latin languages:

{$title}: {$subTitle}

But the problem is extending this approach to include the logic that excludes the : character when the title ends in a particular list of characters (currently ?!/&, and @luizborges proposes adding ;).

For example, we could add three new locale keys…

  • A concatenation with separator: {$title}: {$subTitle}
  • A concatenation without separator: {$title} {$subTitle}
  • A list of terminal characters that indicates the separator should be skipped: ?!/&

…but again, isn’t this somewhat Latin-centric, as it chooses between concatenations based only on the last character of the title?

Regards,
Alec Smecher
Public Knowledge Project Team

I think this problem is really hard to solve, and your idea seems to work really nicely at the small cost of 3 variables. Those keys need to be REALLY well defined to prevent mistakes when localizing them.

EDIT: Regarding using just the last character to decide on concatenation. There is no other way of doing so without resorting to some regex that might not even work in some contexts. I know Portuguese grammar, some English grammar and librarian codes like AACR2. Some other language might work in a completely different way, but at the very least, there is possibility of concatenating everything and getting a full title. Also, instead of characters, maybe the list could check for strings at the end of Title, not sure how useful that would be, but it wouldn’t be too hard to implement (instead of a string of chars, it would be a string of joined strings with some standard delimiter, or something to pack/unpack arrays of strings for safety).

Also, I proposed adding : not ; to the list. Otherwise, we have the following issue:

$title = "Something:"
$subTitle = "Is some thing"

Is that last char from title included in the ?!/&?
No, then make the new string using the {$title}: {$subTitle} template and get "Something:: Is some thing".

My idea is just to prevent duplication of the last title character (:)

Hi all,

I was hoping CSL would have an implementation of this that we can lean on, and there has been some discussion, but apparently nothing in the spec yet:

Regards,
Alec Smecher
Public Knowledge Project Team