Our ALL CONTENT was Wipped Out of Google-Scholar

soapubs · November 26, 2024, 7:48am

Dear PKP Team, and fellows in the community,

Our content was smoothly indexed by Google Scholar, even on sunday 24 November we still noticed some new indexed published content updates in Google-Scholar. However, starting on Monday, November 25, we noticed that all of our previously indexed content has disappeared from the platform, as if it no longer exists.

and now there is only one recored from researchgate here when searching soapubs.com

For reference, here is the link to our website: https://soapubs.com, and here are a few examples of our published content:

https://soapubs.com/index.php/EI/article/view/161
https://soapubs.com/index.php/MI/article/view/40
https://soapubs.com/index.php/IJLAI/article/view/46

We have thoroughly checked our setup and confirmed that our metadata complies with Google Scholar’s requirements. We are using Open Journal Systems (OJS), and the Googlebot is able to access our pages in under 10 seconds. We also ensure that each article has proper links to the abstract, PDF, and other relevant content.

Besides, There is something I want to clarify, I added some codes to the custom header plugin’s footer content, aiming to changed text ‘affiliation’ to text ‘phone number’ shown in the author registration stage, the reason I mentioned this is that because after this action all content can not be found in GS, so I guess this might be the reason

The codes is detailed below:

script type=“text/javascript”>
document.addEventListener(‘DOMContentLoaded’, function() {
// Find the affiliation label
var affiliationLabel = document.querySelector(‘.form-group.affiliation label’);
// Make sure the label exists
if (affiliationLabel) {
// Directly modify the text content of the label to “Phone Number”
affiliationLabel.innerHTML = affiliationLabel.innerHTML.replace(“Affiliation”, “Phone Number”);
} else {
console.error(“Affiliation label not found”);
}
});

script type=“text/javascript”>
document.addEventListener(‘DOMContentLoaded’, function() {
// Find the .label label inside affiliation
var affiliationLabel = document.querySelector(‘.affiliation .label’);
// Make sure .label label exists
if (affiliationLabel) {
// Only modify the text content of “Affiliation” to “Phone Number”
var labelTextNode = affiliationLabel.childNodes[0]; // Get the first child node (text node)
if (labelTextNode && labelTextNode.nodeType === Node.TEXT_NODE) {
labelTextNode.textContent = "Phone Number ";
}
} else {
console.error(“Affiliation label not found”);
}
});

Could you and the PKP team kindly help us understand the root cause of this issue? If there is anything further we can do to resolve it or any suggestions to get our content re-indexed, we would greatly appreciate your guidance.

mpbraendle · November 26, 2024, 10:34pm

I don’t think your JavaScript DOM change has to do with that - JavaScript is not interpreted by bots, usually.

Please don’t post 4 times similar content, see Forum Guidelines and Code of Conduct

You may investigate with Google directly, see their contact address on Google Scholar Help

soapubs · November 27, 2024, 2:01am

Dear @mpbraendle, many thanks for your message, its my fault, just want to open a new post.

Regarding your suggestions, it is a good idea, but the difficulties is that there is only an email left there on their official website, and I have heard that they do not often check their email box.

I saw in community that many colleagues have similar issues like mine, just wondering if they have solved their issues.

soapubs · December 2, 2024, 7:51am

Is there no one who share the same experience?

kmccurley · December 2, 2024, 8:35am

(former google employee here). The google crawler definitely executes javascript after it crawls a page, but there is no guarantee that the results would be used in the indexed content. See this documentation. There is rather little public information about what Google Scholar chooses to include in their index or how they post-process the crawled information. My understanding is that they generally recognize OJS sites, and will probably index it if all other criterion are satisfied. There are constraints like the fact that files must be under 5MB, and the fact that everything should have an abstract. My suspicion is that they may now use a machine learning process to decide if content looks like scholarly material - otherwise google scholar would be vulnerable to spam. See their inclusion guidelines.

soapubs · December 2, 2024, 8:57am

You’re absolutely right, and I agree with your points. However, what I want to clarify is that we are confident we meet Google’s indexing requirements. To summarize, the key criteria are having a direct-access abstract, PDFs under 5MB, and the correct meta name tags. For OJS users, they generally have the Dublin Core Indexing Plugin and Google Scholar Indexing Plugin enabled, and I’ve checked our publications to confirm that their meta names are correct—at least for the ones I’ve reviewed. Furthermore, we’ve followed the suggestions provided by both PKP and Google Scholar, such as ensuring consistency in language and content.

In short, it’s very strange that our entire journal cannot be found within a day of searching. Honestly, I don’t have much expertise in web technologies, and my knowledge is primarily in journal management. However, I am now able to check if we meet Google Scholar’s indexing requirements on my own. If you’re willing, could you take a look at our website’s source code? There doesn’t seem to be any issue, but the articles are mysteriously not showing up:
https://soapubs.com/index.php/EI/article/view/161

If I were to guess, perhaps it’s related to the quality of our articles not meeting Google Scholar’s internal, somewhat opaque requirements, or maybe it’s an issue with our IP (China) preventing us from appearing on Google?

We’re really at a loss here. Could you help us?

soapubs · December 2, 2024, 9:00am

Oh BTW, these javascript was all deleted by us, since this deletion from GS occured, but the tragedy is that this do not help.

kmccurley · December 2, 2024, 8:10pm

I’m not really an OJS user, and Google Scholar is an opaque service that is impossible to predict or control. One thing you might try is to encourage authors to create Google scholar profiles and add their papers to their profile. Ultimately I think it also depends upon whether other journals that you don’t manage also cite your DOIs. This impossible for you to control, but scholar probably uses these as indicators for quality of content. If something doesn’t have a sufficiently high H-index, then it may fail to be indexed. These things take time - sometimes years.

To be honest, I wouldn’t be surprised if Google Scholar just goes away in the future once the founders retire. Scholar is a hard thing to run and is vulnerable to all sorts of spam and manipulation. It’s not an important product for the company, and the company has a habit of tearing things down if they don’t like running them (e.g., Google+, Google Reader, Chromecast, Google Podcasts, etc). Microsoft discontinued their academic search project in 2021, and handed the data over to openalex.org.

asmecher · December 9, 2024, 5:48pm

Hello all –

There are a lot of “me too” threads about recent Google Scholar issues, so I’ll direct them into one place and close off the other threads in order to help organize the forum.

I’ll leave this thread open:

Regards,
Alec Smecher
Public Knowledge Project Team