[OJS 3] Disallow excessive spaces in names? Building author indexes

Ph_We · September 17, 2019, 7:24am

I don’t even know whether it would be a bug or a feature request))
We’ve noticed that any excessive space in a contributor name entered (leading, trailing, double space) would make this name different from the same name entered without excessive spaces. This would result in creation of a new author profile, even if the email is the same. Which makes the problem of duplicating author profiles even worse.

I might be mistaken, but I do not know any exceptions (in the whole world), which would make preserving those spaces necessary. So I would propose deleting/trimming those at the entry level…

Ph_We · September 19, 2019, 6:45am

OK, since I think it may be important, I’ve created the issue:

Ph_We · September 20, 2019, 1:22pm

Hi @asmecher,

Thanks for answering in the issue. But something still eludes me.
You say, that author records are intentionally not deduplicated. So it is interesting why you decided not to do this?

Furthermore, when we discussed the problem of author identification 2-3 years ago, you told they are identified by names+emails (AFAIR). Also you told, that identification by ORCID would have been the best solution in the future. So anyway, they should be deduplicated if those two coincide.
Now I realize you might have meant user records and not author records, which are stored separately. Is this right?

Now I have two questions:

Is it possible to build author indexes based on what we have now?
Will identification by ORCID solve the user/author problem (by ‘connecting’ users & authors in same author profiles)?

asmecher · September 20, 2019, 7:18pm

Hi @Ph_We,

The authors table stores the author records for all submissions, with no deduplication done. This is to preserve the scholarly record – if an author publishes a document, then changes their institution or name, then publishes a second submission, the change should not apply to the first article.

If you want to generate a list of authors that’s properly deduplicated, you’ll need to use ORCIDs, then query the database for all unique ORCIDs (and potentially the most recent author record for each). OJS 3.x does’t include that facility built-in – we prefer to delegate providing the author list to ORCID, as it’ll also be able to include authors from different OJS installations. So I think your best bet will be to encourage authors to use ORCID for the sake of properly populating their holdings in ORCID, and if you need a specific report on authors in your database, you can query that up pretty quickly from e.g. phpMyAdmin, or extract it from the authors report plugin.

Regards,
Alec Smecher
Public Knowledge Project Team

Ph_We · September 25, 2019, 12:55pm

Hi @asmecher, this of course makes sense, thank you. However, it does not make my suggestion obsolete. First, if the ‘***/search/authors’ page might be fixed, then building the author index based on given and family names alone would be a viable decision. Especially if journals cannot rely on ORCID alone. And even if no namesakes are disambiguated.
Also, even if ORCID is used to deduplicate authors, there might still remain a problem of managing ‘alternative’ names (if those spaces are still allowed).

And I am still perplexed on what kind of profiles were supposed to be identified by names and emails)

asmecher · September 25, 2019, 7:41pm

Hi @Ph_We,

I think the ***/search/authors page is considered deprecated – it’s not linked anymore from anywhere within OJS (that I’m aware of) and I’d like to remove it at some point. Can you tell me more about how you’re using it?

Regards,
Alec Smecher
Public Knowledge Project Team

habib · September 26, 2019, 8:07am

Hi @Ph_We,
I’m also using ***/search/authors and eliminate duplicates in templates/frontend/pages/searchAuthorIndex.tpl. But it’s only a simple workaround.
Regards
habib

Ph_We · September 26, 2019, 2:46pm

Our journal managers use it to have a ‘bird-view’ of all the authors ever contributed to the journal. Because in the Users & Roles they can only find those authors, which have accounts (users).

I think many journals would like their readers to have the ability to browse through all the authors. And to make ‘***/search/authors’ available in their Nav menus. It is a usual page for many journals outside OJS.

So I would humbly propose not to remove it. Instead, it might be fixed to make it really usable:

To strip names from all unnecessary spaces;
To filter authors either by ORCID, or only by their given and family names (and not by their affiliation as it is now).
To fix the issue with the alphabet and the first letter taken from the given name instead of the family name.

asmecher · September 26, 2019, 3:31pm

Hi @Ph_We,

I think attempting to disambiguate authors by matching names is always going to be flawed, so I’m hesitant to invest much time in improving/maintaining that approach. However, in addition to ORCIDs, there’s another discussion that seems relevant here: Introduce disambiguating author IDs · Issue #2986 · pkp/pkp-lib · GitHub

I’ve linked that discussion to this thread as well.

Regards,
Alec Smecher
Public Knowledge Project Team