Google Scholar is Not Indexing OJS Documents

I already posted about this issue in the old forums but I wanted to start a new thread here to see if anyone else has suggestions.

I already read through their troubleshooting page and filled out the manual request form, in which I noted that I was using OJS. All I got back was an automated response saying to give it a few weeks.

This is a serious problem for us at this point and while I know it isn’t the fault of PKP or OJS, I really do not know where else to turn. Many thanks for any assistance you may be able to offer.

Hi @jamilj,

Unfortunately the Google Scholar indexing process is opaque to us; to find out the status of your system, you’ll have to work with Google – though it might be tricky, as they receive too many indexing queries to spend much time on them.

Regards,
Alec Smecher
Public Knowledge Project Team

Werid thing is, Goolgle encourage the use of OJS in ther scholar incursion guidelines.

Alternatively, if you have the technical expertise to manage your own website, we recommend the Open Journal Systems (OJS) software that’s available for download from the Public Knowledge Project (PKP).

I’ll try to follow their guide to include my OJS journal site and see if they index it. (last time it took a few week for my old PDF repository to appear in google scholar)

@robasempai If you have any success please let me know. I am on my third request for inclusion at the moment and still nothing.

I just include my site and the auto response say this:

Thanks for submitting your website to Google Scholar. Our crawl team is working hard to add new content as quickly as possible, and we appreciate your assistance.

To ensure that your website is indexed, please check the following in your hosting software:

Ensure that your robots.txt doesn’t block /issue/ or /article/ URLs.
Ensure that your abstracts have the following meta-tags: citation_title, citation_authors, citation_journal_title and citation_date. You can check for these by viewing the HTML source for any of the abstract pages.

If your content meets our guidelines, you can generally expect to find it included within the Google Scholar results within 4-6 weeks.

I check one paper HTML and it has these meta-tags

When they say it takes 4-6 weeks to include your content… they really mean 6-8 weeks

We went online with our journal not 2 weeks ago and most of our articles are already listed on Google Scholar. Are you using the Google webmaster tools and did you submit a sitemap? That’s what I did and maybe that helped with the inclusion at Google Scholar.

@D_Schroeder_Micheel, I can’t thank you enough. I had not submitted a sitemap and I do not know if it will help but I have done so. On the one hand, I feel embarrassed not have thought of this; but on the other hand, if it works it should probably be included in the OJS wiki somewhere because I’m sure every OJS user wants Scholar to crawl their articles.

Anurag Acharya of Google Scholar presented what was (in my opinion) the highlight of Open Repositories 2015:

He described some very basic assumptions of Google’s indexing, particularly highlighting Google Scholar’s requirements in contrast to Google web search.

OJS got an honorable mention in the middle of the presentation as a success story, but I think some of the nuggets can help us become even better.

1 Like

@jamilj did you have any luck with submitting your sitemap? the institute I am working for just released it’s second OJS journal site and after succesfully submitting the sitemap those articles have also been indexed by google scholar even though they lack abstracts.

@D_Schroeder_Micheel No, I did not. As you can see if you search for any of the articles we have published on our site, Monthly Review Archives. I will keep trying, of course but this has been extremely frustrating.

Well it took some weeks but I finally got indexed by Google Scholar… What I did? well I just searched some of the keywords included in the articles of my journal, and they appear. I dont know if your articles have keywords neither if you want to show them but this is how I cheked the inclussion in Scholar.

For any one reading this and wondering about Google Scholar indexing timeline: we never submitted anything to Google or took any SEO measures for our OJS and our articles are usually indexed about 4-6 days after publication.

I am happy that others are not having any issues with Scholar but I must sadly report that my journal is still not being indexed. There is some redundant indexing going on through Proquest, with whom we have a contract but Google Scholar simply refuses to index our articles. So far I have done the following, to no avail:

  1. Correctly installed, configured and deployed over 11,000 articles with abstracts, DOIs and all the required metadata.
  2. Submitted a sitemap using webmaster tools.
  3. Made numerous requests with google to index my site.
  4. Added HTML links to full HTML versions of articles that are available (on a companion website). This comes to a about 4,000 of the 11,000+ articles.
  5. Published new content every month, and before the major content providers get our work, i.e., we have them delay publication so that crawlers “see” our content first.
  6. Went through the pdf presentation by Anurag Acharya (mention above by @ctgraham) to make sure I did not miss anything.

I am at whits end.

Thanks @jamilj et. al. to post all your doubts and advances.

We are also trying to deal with this. Inclusion in google looks arbitrary… and we are not still able to understand the logic.

@ctgraham, thanks for the links. We will review them carefully.

Hi!
I just had an idea. In OJS files directory is usualy hiden and not web accesable. But, if you will host your pdfs in webaccesable destinations, than most probably Google bots will find them and will index. I know that it is not the way should be, but it probably will work. For example, in OJS 3 is the possibility to upload pdf and attach it to galley or just to indiate the path to existing pdf. So, you can upload your pdfs in any destination and after that reconect galleys by links.
The other way is to include articles in different databases like DOAJ or Zenodo.
Hope, this will help.

Hi @novikoffav,

That’s a dangerous configuration – authors will be able to upload malicious files (e.g. .phtml code) and then construct a URL to cause the server to execute them. Not to mention that you’d expose pre-publication content without any access controls.

Regards,
Alec Smecher
Public Knowledge Project Team

Hi @asmecher,
Well, it is sad. :frowning:

Indexing in GS is a difficult matter. For those, who would like to know more, I would recommend reading this book: Improving the Visibility and Use of Digital Repositories Through SEO: A LITA ... - Kenning Arlitsch, Patrick Obrien - Google Books

As to the problems with OJS, you might probably try to configure it manually in GS:
https://partnerdash.google.com/partnerdash/d/scholarinclusions

Greetings,

OJS3 is very good indexed in google scholar. All published articles in our journal, that is not officially released yet, are all indexed in this database. Only problem is that google scholar is not seeing xml.

Hi @Vitaliy,

Pre-publication content shouldn’t be indexed – are you sure that’s what’s happening? Is it possible that your PDFs etc. are being accessed through another mechanism, such as a files_dir that’s directly accessible through the web server?

Regards,
Alec Smecher
Public Knowledge Project Team