Google indexing non-public journals

Hi all,

We have been using non-public journals as sandbox/test grounds for production journals, where users can experiment with CSS styling and other functionalities. Currently, we are not able to host a separate instance of OJS purely for testing, hence our approach of using non-public journals.

An unexpected side-effect is that Google was able to find these non-public test journals and index them in both main search and Scholar.

Is there a way to prevent the indexing of non-public journals? Is there a automatic sitemap that lists all the journals somewhere in OJS? We have not explicitly set up a sitemap for our instance.

Thank you;

Sean Xiao

Google will honor a robots.txt file:
http://www.robotstxt.org/robotstxt.html

You can use that to block indexing of specific paths.

Hi @SeanXiaoZhao,

If your journal is not flagged “enabled” in the Journal Manager’s “Hosted Journals” area, then it shouldn’t be indexed by Google – unless something links to it somewhere outside of OJS. It’s possible, if you’re using an older release of OJS, that there was a link somewhere that could’ve exposed the content. But @ctgraham’s suggestion is the best solution.

Regards,
Alec Smecher
Public Knowledge Project Team

This will also be addressed directly within the software in the next major release: Fix behavior when journal not publicly enabled · Issue #592 · pkp/pkp-lib · GitHub

Hi,

I actually wrote a small plugin for 2.x that adds the robot meta for hidden journals. It is available here: GitHub - ajnyga/addRobotMeta: Plugin for Open Journal Systems. Adds robot meta for unpublished journals to prevent search engine indexing.

Seems to work without any problems, but let me know if you see one.

Happy to hear that (also) this is addressed in 3.0!

3 Likes

Hi @ajnyga,

Excellent little plugin!

Regards,
Alec Smecher
Public Knowledge Project Team

1 Like