How does OJS treats Title and Subtitle when exporting data?

luizborges · April 23, 2021, 12:15pm

In OJS 3.3.0.3, we now have the subtitle field, my question then is:
Does Title and Subtitle get concatenated in every instance when the export target (OAI-PMH, DOAJ, etc?) only has a Title field is present? Or when exporting to something that just have Title is the Subtitle ignored?

This is important because if we have a title like “Something something: Questions regarding something”, we have to know if we must use that semi-colon or not, or how we should handle that (also taking into account the AACR2).

rcgillis · April 24, 2021, 1:44pm

Hi @luizborges,

What specific version of OJS you’re using? It may vary based on the export mechanism used (e.g. OAI-PMH, QuickSubmit, DOAJ, etc.).

-Roger
PKP Team

luizborges · April 24, 2021, 1:56pm

My mistake, I forgot to mention my full version, OJS 3.3.0.3.
But I don’t think this affect how subtitles are handled, right? Since they are new to OJS3.

rcgillis · April 25, 2021, 10:00am

Hi @luizborges,

I was only able to test the DOAJ export so far - I did it in OJS 3.3 - the reason why I asked is so I could make sure that we were on the same page when we tested this - there could very well be differences in how this behaves between versions. And I what I found is that it didn’t export the subtitle field at all. Here’s an example of the DOAJ XML export:

<records xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://doaj.org/static/doaj/doajArticles.xsd">
  <record>
    <language>eng</language>
    <publisher>PKP Publishing Services</publisher>
    <journalTitle>OJS3 Testdrive Journal</journalTitle>
    <eissn>2049-3630</eissn>
    <publicationDate>2021-03-18</publicationDate>
    <volume>1</volume>
    <issue>3</issue>
    <doi>10.1234/td.v1i3.722</doi>
    <publisherRecordId>1039</publisherRecordId>
    <title language="eng">Effectiveness of influenza vaccination for healthy adults</title>
    <authors>
      <author>
        <name>Vitaliy Bezsheiko</name>
        <affiliationId>0</affiliationId>
      </author>
    </authors>
    <affiliationsList>
      <affiliationName affiliationId="0">Bogomolets National Medical University</affiliationName>
    </affiliationsList>
    <abstract language="eng">
Vaccines for a long time have been used to prevent influenza. They are often recommended for healthy adults before an influenza season, despite the low risk of complications due to this infection in them. The article reviews the rationale for such recommendations.
</abstract>
    <fullTextUrl format="html">https://demo.publicknowledgeproject.org/ojs3/testdrive/index.php/testdrive-journal/article/view/722</fullTextUrl>
    <keywords language="eng">
      <keyword>influenza</keyword>
      <keyword>vaccine</keyword>
      <keyword>adults</keyword>
      <keyword>complications</keyword>
    </keywords>
  </record>

And here it is showing on our test instance:

Screen Shot 2021-04-25 at 6.47.17 AM

So, not only is it not concatenating - it’s not even being included in the export, so far as I can tell. This might be a bug, or it might be intentional, but I wanted to ask: are there instances where you notice the two fields concatenating in other export utilities? And, if so, could you elaborate further on this?

And, as I’ve noted - I haven’t tested it on other export mechanisms - that would take some time. I will have to follow-up with our Dev Team on this and get back to you.

-Roger
PKP Team

luizborges · April 26, 2021, 10:59am

@rcgillis actually we are still considering how we are going to use the subtitle field, and if we will actually use it. We manage journals from linguistics and some other humanities areas, and the subtitle is very important. For now we will keep everything under title since we want all that information to be visible at every endpoint (I know one of our journals use/used DOAJ and a few of them uses the Crossref export for DOIs, also a lot of indexers rely on the OAI for harvesting data…

asmecher · April 26, 2021, 3:28pm

Hi @luizborges,

Each part of the system that needs to format a title into a 3rd-party (export) format is coded separately, so it might choose to get just the title field, or concatenate title and subtitle, or ideally the 3rd-party format supports separate title and subtitle fields and both can be provided separately.

In the case of the Dublin Core format (used by OAI-PMH), for example, the code calls getFullTitle:

github.com

pkp/ojs/blob/main/plugins/metadata/dc11/filter/Dc11SchemaArticleAdapter.inc.php#L84


$oaiDao = DAORegistry::getDAO('OAIDAO'); /* @var $oaiDao OAIDAO */
$journal = $oaiDao->getJournal($article->getData('contextId'));
$section = $oaiDao->getSection($article->getSectionId());
if ($article instanceof Submission) { /* @var $article Submission */
    $issue = $oaiDao->getIssue($article->getCurrentPublication()->getData('issueId'));
} else {
    $issue = null;
}

$dc11Description = $this->instantiateMetadataDescription();

// Title
$this->_addLocalizedElements($dc11Description, 'dc:title', $article->getFullTitle(null));

// Creator
$authors = $article->getAuthors();
foreach ($authors as $author) {
    $dc11Description->addStatement('dc:creator', $author->getFullName(false, true));
}

// Subject

This will include the subtitle.

Regards,
Alec Smecher
Public Knowledge Project Team

luizborges · April 26, 2021, 7:04pm

Hello @asmecher,
This is also what I wanted to know, how it concatenates the data:

	static function concatTitleFields($fields) {
		// Set the characters that will avoid the use of
		// a semicolon between title and subtitle.
		$avoidColonChars = array('?', '!', '/', '&');

		// if the first field ends in a character in $avoidColonChars,
		// concat with a space, otherwise use a colon.
		// Check for any of these characters in
		// the last position of current full title value.
		if (in_array(substr($fields[0], -1, 1), $avoidColonChars)) {
			$fullTitle = join(' ', $fields);
		} else {
			$fullTitle = join(': ', $fields);
		}

		return $fullTitle;
	}

luizborges · April 26, 2021, 7:06pm

Also, I would add a a semicolon ’ : ’ to the $avoidColonChars to prevent someone following AACR2 and putting the semicolon there ending up with two semicolons in a row…

asmecher · April 27, 2021, 6:15pm

Hi @luizborges,

This is somewhat off-topic, but I’m hesitant to extend that implementation further without considering other languages. The current implementation is quite Latin-centric, and assumes a left-to-right language. Maybe there is a 3rd-party implementation or standard out there?

Regards,
Alec Smecher
Public Knowledge Project Team

luizborges · April 27, 2021, 6:31pm

@asmecher this is the current code called by getFullTitle(), I just suggested that addition of semicolon to $avoidColonChars because otherwise the code would run and add another semicolon to the end, now if a semicolon is detected it is just ignored as are the other punctuation.

leonardof · April 27, 2021, 8:50pm

When I was a translator in GNOME, sometimes there would be messages like %{title}s: %{subtitle}s with sprintf, with a comment to let translators know what it meant. Good news, this solves the hard-coded Latin-centeredness. Bad news, it gives room for translators to break stuff. One way of mitigating this issue was a hook that ran a lot of checks on commits when they were pushed. But that was ~ 10 years ago. Since then, “damned lies” (their l10n web app) began to be used instead of directly git, so maybe damned lies itself does these checks nowadays.

asmecher · April 30, 2021, 4:40pm

Hi @leonardof and @luizborges ,

We could perhaps add a new locale key as @leonardof proposes to allow each language to specify how title and subtitle are joined, e.g. for Latin languages:

{$title}: {$subTitle}

But the problem is extending this approach to include the logic that excludes the : character when the title ends in a particular list of characters (currently ?!/&, and @luizborges proposes adding ;).

For example, we could add three new locale keys…

A concatenation with separator: {$title}: {$subTitle}
A concatenation without separator: {$title} {$subTitle}
A list of terminal characters that indicates the separator should be skipped: ?!/&

…but again, isn’t this somewhat Latin-centric, as it chooses between concatenations based only on the last character of the title?

Regards,
Alec Smecher
Public Knowledge Project Team

luizborges · April 30, 2021, 5:30pm

I think this problem is really hard to solve, and your idea seems to work really nicely at the small cost of 3 variables. Those keys need to be REALLY well defined to prevent mistakes when localizing them.

EDIT: Regarding using just the last character to decide on concatenation. There is no other way of doing so without resorting to some regex that might not even work in some contexts. I know Portuguese grammar, some English grammar and librarian codes like AACR2. Some other language might work in a completely different way, but at the very least, there is possibility of concatenating everything and getting a full title. Also, instead of characters, maybe the list could check for strings at the end of Title, not sure how useful that would be, but it wouldn’t be too hard to implement (instead of a string of chars, it would be a string of joined strings with some standard delimiter, or something to pack/unpack arrays of strings for safety).

Also, I proposed adding : not ; to the list. Otherwise, we have the following issue:

$title = "Something:"
$subTitle = "Is some thing"

Is that last char from title included in the ?!/&?
No, then make the new string using the {$title}: {$subTitle} template and get "Something:: Is some thing".

My idea is just to prevent duplication of the last title character (:)

asmecher · April 30, 2021, 8:03pm

Hi all,

I was hoping CSL would have an implementation of this that we can lean on, and there has been some discussion, but apparently nothing in the spec yet:

Regards,
Alec Smecher
Public Knowledge Project Team

Ramfra · November 4, 2024, 11:44am

Hi,

I’m wondering if there has been any progress on this issue, in particular with regard to exports to DOAJ. We host a number of journals that use the subtitle field, and it is problematic that this metadata doesn’t get passed on to DOAJ. Is our best option at this point to advise journals not to use subtitles?

OJS 3.3.0-19 and DOAJ plugin version 1.1.0.0

asmecher · November 6, 2024, 10:53pm

Hi @Ramfra,

Try this patch, and let me know if it works. If so, I’ll get it included in the next release.

diff --git a/plugins/importexport/doaj/filter/DOAJJsonFilter.inc.php b/plugins/importexport/doaj/filter/DOAJJsonFilter.inc.php
index c23c470c6d..0e990624d1 100644
--- a/plugins/importexport/doaj/filter/DOAJJsonFilter.inc.php
+++ b/plugins/importexport/doaj/filter/DOAJJsonFilter.inc.php
@@ -87,7 +87,7 @@ class DOAJJsonFilter extends NativeImportExportFilter {
                if (!empty($issueNumber)) $article['bibjson']['journal']['number'] = $issueNumber;
 
                // Article title
-               $article['bibjson']['title'] = $pubObject->getTitle($pubObject->getLocale());
+               $article['bibjson']['title'] = $publication->getLocalizedFullTitle($pubObject->getLocale());
                // Identifiers
                $article['bibjson']['identifier'] = array();
                // DOI

Regards,
Alec Smecher
Public Knowledge Project Team

Ramfra · November 7, 2024, 3:04pm

Hi @asmecher,
Thanks so much for your help with this!
We tried applying the patch, but when I export from the DOAJ plugin, the XML still only includes the main article title. We think we did it correctly, but is there any info we can provide that would help you identify whether it isn’t working due to user (our) error or due to the patch itself?

asmecher · December 18, 2024, 11:22pm

Hi @Ramfra,

With apologies for the wait – it looks like the DOAJ plugin exports in both JSON and XML form; the patch above only affects the JSON export, and you’re using the XML. Could you try this additional patch and confirm if it fixes the XML as you expect?

diff --git a/plugins/importexport/doaj/filter/DOAJXmlFilter.inc.php b/plugins/importexport/doaj/filter/DOAJXmlFilter.inc.php
index 7c697670cd..ad36ef886f 100644
--- a/plugins/importexport/doaj/filter/DOAJXmlFilter.inc.php
+++ b/plugins/importexport/doaj/filter/DOAJXmlFilter.inc.php
@@ -116,7 +116,7 @@ class DOAJXmlFilter extends NativeExportFilter {
                        $type = $publication->getLocalizedData('type', $publication->getData('locale'));
                        if (!empty($type)) $recordNode->appendChild($node = $doc->createElement('documentType', htmlspecialchars($type, ENT_COMPAT, 'UTF-8')));
                        // Article title
-                       $articleTitles = (array) $publication->getData('title');
+                       $articleTitles = (array) $publication->getFullTitles();
                        if (array_key_exists($publication->getData('locale'), $articleTitles)) {
                                $titleInArticleLocale = $articleTitles[$publication->getData('locale')];
                                unset($articleTitles[$publication->getData('locale')]);

Regards,
Alec Smecher
Public Knowledge Project Team

Ramfra · January 17, 2025, 12:08pm

Hi @asmecher,
Thanks so much for this additional patch. We tried it and it works! Am I correct in thinking that the next version of the plugin will include the patch?

This may be out of the scope for this issue, but I noticed that the metadata for articles previously registered with DOAJ don’t appear to be corrected if I re-register them through the plugin, only if I upload the XML directly to DOAJ. I see a prior post about this on the Forum (DOAJ metadata update possible?) but it’s unclear to me whether updating metadata through the plugin is - or should be - possible. Do you know if it is?
Regards,
Ramana

asmecher · January 17, 2025, 2:51pm

Hi @Ramfra,

Yes, the change will be included in the next release of the plugin.

As for your related question – I don’t know that plugin well, but I don’t believe we automatically update metadata with DOAJ if it’s been changed after the initial deposit.

Regards,
Alec Smecher
Public Knowledge Project Team