After upgrading to OJS 3.1.0.1. our access log and error log is filling up with 404 errors that are caused by requests to old citation plugin addresses.
There are so many of them that also Google Webmaster tools has sent me an email about the growing number of 404’s two times now (edit, three times now). First I thought that they would eventually stop, but they just keep on coming. Any idea what is causing them? I am thinking that this is connected to the change that occured with the citationStylePlugin. Maybe there should be a 302 redirect of some sort? @NateWr@asmecher
Here are a few of them within just a couple of seconds. This is from the access log, there is a corresponding 404 error line in the error_log of course.
54.xx.xx.70 - - [18/Feb/2018:21:13:04 +0200] “GET /journal1/article/cite/50582/CbeCitationPlugin HTTP/1.1” 404 22
54.xx.xx.85 - - [18/Feb/2018:21:13:18 +0200] “GET /journal2/article/cite/60023/RefWorksCitationPlugin HTTP/1.1” 404 22
54.xx.x.143 - - [18/Feb/2018:21:13:19 +0200] “GET /journal3/article/cite/58614/BibtexCitationPlugin HTTP/1.1” 404 22
54.xx.xx.39 - - [18/Feb/2018:21:13:20 +0200] “GET /journal4/article/cite/8313/RefManCitationPlugin HTTP/1.1” 404 22
All these errors are making the error log pretty useless, or at least hard to follow. I got around 100 404 errors within 5 minutes so after a day or so you really have to search for actual errors.
I am worried that the large amount of 404s will affect our results in google. Maybe when handlers are removed from OJS there should be a policy of first adding a 302 redirect and removing the actual handler only later?
Hmm, yes, we removed these hand-written citation format plugins in favour of a CSL-based implementation. In other aspects of OJS where URLs that were previously used changed, we introduced 301 Moved Permanentlyredirects, but those were to avoid breaking useful links, where I think this is more a case of a potential SEO issue, which is a lower priority. I’d be happy to review/merge a proposal on this.
Regards,
Alec Smecher
Public Knowledge Project Team
I guess I could reintroduce the cite handler and just add the redirects there.
Maybe:
Check if Citation Style Language plugin is enabled.
if it is enabled match requests to new url’s. For example:
/journal3/article/cite/58614/BibtexCitationPlugin => /journal3/citationstylelanguage/download/bibtex?submissionId=58614
If there is no match (for example RefWorksCitationPlugin seems to be missing from the new plugin) then redirect to the submission abstract page.
If Citation style plugin is disabled, redirect all requests to the correct abstract page.
I’d be tempted to have imperfect matches stay as 404s, as that’s technically more accurate – the content was here but now it doesn’t exist anymore – but it’s not a strong opinion.
Regards,
Alec Smecher
Public Knowledge Project Team
You are probably right. I am having hard time to understand where all those thousands of hits are coming from. They seem to be at least partly search engines, but you would think that they would learn within a few weeks that the resource is really gone. I will give it a few weeks more and see what happens.
I’d assume it’s a bevy of crawlers that picked these URLs up from the publishing front-end e.g. via the article view. Does the user agent include any indication?
Regards,
Alec Smecher
Public Knowledge Project Team
hmm, I see now that our access_log is not saving the user agent (could be a GDPR setting they made with our server, have to ask).
But from the IP’s I can see that at least some of them are coming from Google. So yeah, probably crawlers, but you would think that they would learn in 6 months that those url’s no longer exist.