Google Scholar indexing "too much" information

dnoesgaard · June 25, 2021, 9:07am

Hi, I apologize in advance if this is slightly off-topic, but I’m struggling to find a solution to this problem.

I work for GBIF and we rely on Google Scholar for picking up citations of data from GBIF. One journal, the European Journal of Taxonomy, who uses OJS and is indexed by GS, also has a button/link to GBIF on every article landing. Unfortunately, this causes nearly every single paper from the journal to show in our searches as if GBIF is cited in the paper.

Example: One new genus and nineteen new species of ground spiders (Araneae: Gnaphosidae) from Iran, with other taxonomic considerations | European Journal of Taxonomy

Screenshot 2021-06-25 at 11.05.10

I want to advise the journal, but I’m not sure what they could do to exclude this from the information being indexed by GS, except remove the button.

I realize this might be a stretch, but I would appreciate any help anyone could provide.

rcgillis · June 25, 2021, 12:16pm

Hi @dnoesgaard,

I’m not all that familiar with GBIF and how it is being used by the journal, but is the dataset meant to be part of the journal article - like a complementary dataset that is related to the article, or it is just something that is cited in the article? It’s not surprising that GS indexing is picking it up, because it is being shown as a galley (at least in the example you’ve given) and not a supplementary file, which is a more common way of linking a dataset with a publication.

-Roger
PKP Team

dnoesgaard · June 25, 2021, 12:40pm

Hi Roger, thanks for your comments.

To be honest, I haven’t really looked into why the button is there and what it does. It seems to simply link to a dataset search on our website for the DOI of the paper. I guess the logic being any digitized taxonomic treatments in GBIF would carry the same DOI—which I now have found some examples of.

In any case, the inclusion of the phrase “GBIF” in GS is as useful as the phrase “PDF”, illustrated more clearly here:

https://europeanjournaloftaxonomy.eu/index.php/ejt/issue/view/1057

I wonder if there any kind of meta-tagging to avoid indexing?

rcgillis · June 25, 2021, 3:22pm

Hi @dnoesgaard,

It looks to me as though it is being included as a galley, which is intended to be used for versions of the article for different formats (e.g. PDF, HTML) - not supplementary material that are intended to complement the article itself. Because it is setup as a galley format, this tells GS that it is a version of the article, so you’d have to disassociate it from being an actual galley file, and distinguish it in some way, in order to get GS to crawl it differently. You can get a bit of a sense of this here: Authoring - there is a mechanism to upload a dataset as an actual file type. I will try to hunt down some journals that publish datasets to try to give a more concrete example.

I’m not sure how to avoid this exactly. And, I know that since GS indexing is automated, it’s often hard to go back and get corrections rendered in GS, since it takes GS a while to re-index sites. It might be worth having a look at our GS guide here: Google Scholar Guide

This might give you a better sense of how GS interacts with OJS. But I will look into this further and consult with a few colleagues to see if I can recommend a more robust solution.

-Roger
PKP Team