Statistics, logfiles and GDPR

Describe the problem you would like to solve
As of now (3.5.0.3) OJS saves a hash of the user I.P. in the logfiles. Strictly seen, this amounts to pseudonymisation - since a hash can in principle be resolved - and we still work with personal data. To comply with GDPR we therefore need to either argue, that we pursue a legitimate interest with this, or get user consent (https://gdpr-info.eu/art-6-gdpr/). For legimtimate interest we would need to make sure that

The personal data should be adequate, relevant and limited to what is necessary for the purposes for which they are processed. 8This requires, in particular, ensuring that the period for which the personal data are stored is limited to a strict minimum. 9Personal data should be processed only if the purpose of the processing could not reasonably be fulfilled by other means.

https://gdpr-info.eu/recitals/no-39/

I see a problem here as, we can fulfilled the same purpose if we would truly anonymize the I.P. - and then in effect would no longer have personal data which fall under GDPR regulations.

Describe the solution you’d like

As far as I can see, a solution could be to remove the last two octets from the user I.P before hashing to get anonymous data.

Who is asking for this feature?
Providers

Yours

Felix

Hi @felixhelix

The solution you propose – to remove the last two octets from the IP – would not allow us to calculate the unique usage (according to the COUNTER) or remove the double clicks – when one and the same URL is accessed by one and the same user, that is identified by the IP and user agent. We need the whole IP to identify one and the same user.
(We changed our sessions, so maybe we could work with the session IDs instead, but I would need to double check/think about it further.)

The solution we took was elaborated back then in the project OA Statistics. We hash the IP address using a salt, and the salt changes daily – we do not keep the old salts. That means that on the next day the IP addresses from the previous days cannot be resolved/recalculated back. This one day is the minimum we need to store such a hashed or pseudonymized IP address, where also the salt exist in parallel. Then, after that one day, the IP is actually anonymized, because it cannot be reverted any more, no? (Maybe the only thing the server admins should take care is not to archive the salt for a long time and for every day.)
Thus, would you still see it as problematic regarding GDPR?

Thanks a lot!
Best,
Bozana

Hi @bozana

thanks for your reply: I think that there is a legitimate interest in collecting personal data for the service to work effectively (i.e. fullfilling Counter requirements), and I also would argue that with changing the salt every day we prevent an easy way to restore the I.P. (i..e. by using rainbow tables). So I think that we are abiding to the regulations. In the end it amounts to the question of reasonable measure and possible risk.

However, If it would be possible to process truly anonymized data, we would not need to inform users about this. So this would make the privacy statement a bit shorter :slight_smile:

We also use Matomo for statistics, and they create a hash based on the truncated i.p. and some other user data (i.e. operating system, browser, browser plugins, browser language, see Does Matomo use a fingerprint? FAQ - General - Matomo Analytics Platform). Maybe this could also be a way to solve the issue?
Or maybe allowing admins to decide for themselves? I.e. we do not use Counter, and also have no need for geo-location. So I guess we could do without the last octests to get a sufficient estimate of visits.

Yours,

Felix