DOAJ plugin exports author email addresses - data security issue

Hi,
we are using 3.1.0.1 (soon to be 3.1.1.4), and currently use the DOAJ export plugin to generate xml files for article level indexing in the DOAJ and CNKI. However, the doajArticles.xsd which contains the xml schema definition contains this

 <xs:element name="author" minOccurs="0" maxOccurs="unbounded">
       <xs:complexType>
        <xs:sequence>
         <xs:element name="name" type="xs:string" />
         <xs:element name="email" type="xs:string" minOccurs="0" />
         <xs:element name="affiliationId" minOccurs="0" maxOccurs="unbounded"/>
        </xs:sequence>
       </xs:complexType>
      </xs:element>

and thus, the final xml contains email addresses of the authors, where available in the OJS. For data protection reasons, we have to remove these before sending the xml files to DOAJ.We currently do not use the automated ‘register’ function but upload the xml directly at the DOAJ website. Also, DOAJ explicitly state on their website not to upload email addresses.

How can I fix this? Does the OJS grab the most current doajArticles.xsd from the doaj website (link is in the plugin code)? So does DOAJ themselves have to change the schema?
Or can I change the doajArticles.xsd which I see in the plugin directory on my OJS server?

The code is here:

I concur that the instructions from DOAJ indicate this should be omitted:

I got a reply from the DOAJ team regarding this issue:

DOAJ: Back in April this year we wrote an explicit piece of code which removed all email addresses which publishers had given us and we also updated the ingestion process to ignore all email addresses which were sent to us. That way, the DOAJ db never has email addresses in it. So in terms of GDPR regulations, the db is secure that way.
me: Is there any need for this line of code? It would be best to change the doajArticles.xsd by removing it so all automated doaj indexing tools (I am sure there are more than the OJS plugin) can work with an acceptable level of data security and data economy.
DOAJ: So, no, there is no need for this line of code in theory. We will get this over to our tech partners and log this piece of work. It will be in a queue but we will get to it.

Sounds like they put it in their issue tracker and will change the schema sooner or later. Also, publishers don’t need to worry over e-mail addresses they have already sent during the indexing process.

DOAJ will take care of it in their repository. But how are you handling it if you are submitting the same xml file (generated through OJS DOAJ) to CNKI. As I am also submitting the same xml file to CNKI. How to remove email addresses?

We just started indexing our journals at CNKI this year. So for the last three issues of the most important journal, I did it by hand. The other journals including back issues will be about 2000 articles or more. No way I am removing email addresses by hand for these. I will wait until our update to the newest OJS version is done in a few weeks and then try to change the lines of code in DOAJXmlFilter.inc.php @ctgraham pointed me to in his post above.

@heike_riegler Thanks for your prompt reply. Please share your experience when you customize DOAJXMLfilter.inc.php.

I’m scheduling this as an issue with a fix against OJS 3.1.1-5.
https://github.com/pkp/pkp-lib/issues/4236

Hu, that was quick. Thank you!