Cleaning up old user records

Hello All,

One of our libraries asked me to clean up old/unused user records from our OJS (3.3.0-22). From what I’ve read here, the best way is using mergeUsers.php, which I can do without any problems. Their requirements are:

  • Users with no roles.

This ones I can get with something like:

select distinct username,
date_registered,
date_last_login
from users
where user_id not in (select user_id from roles)
order by date_registered

  • Users who haven’t logged in for more than 5 years.

Easy one too:

select distinct username,
date_registered,
date_last_login
from users
where date_last_login < date(‘2021-03-12’)
order by date_last_login

  • Users who only have the role of Reader

And that’s what I don’t know how to do. A query that shows all the roles of all users would be even better. Where can I find documentation about this?

One last thing: Is it safe to run mergeUsers.php with users using OJS, or should I stop Apache first? I imagine that cleaning up a few users wouldn’t make a difference, but we probably have at least around 10 or 15 thousand to delete.

Regards,

Oberdan

Hi @Oberdan,

To look into roles, join on user_user_groups and from there to roles. The roles table has entries for each journal (context_id, which can be joined onto journals.journal_id). The role_id constants in roles are the ROLE_ID_… constants from the PHP code; see https://github.com/pkp/pkp-lib/blob/main/classes/security/Role.php for details (this is for main, which will be released as OJS/OMP/OPS 3.6, but should be accurate for previous releases back to at least 3.3 as well).

You can explore this in the schema documentation.

There is no need to stop Apache when merging/deleting users, but make sure to take a complete backup before doing drastic admin work like this!

Regards,
Alec Smecher
Public Knowledge Project Team

Hello @asmecher ,

Just tried the first query, to get the users wth no roles, and a user I just created and gave the Reviewer role showed anyway… Any ideas why this would happen?

Regards,

Oberdan

Hi @Oberdan,

In OJS 3.3.0-x and later, there is no roles table. Maybe this is left from a former version? You should be joining on user_user_groups and user_groups as I mentioned above.

Regards,
Alec Smecher
Public Knowledge Project Team

Hello @asmecher ,

I was starting to think there was something very wrong with our database… Thank you!

Regards,

Oberdan

1 Like

Hello @asmecher ,

The old tables that didn’t belong to OJS 3.3.0-22 were dropped, so we’ll never have problems related to them again.

My new query for users with no roles, now using user_user_groups is:

  select distinct u.username username, 
                  us.setting_value nome, 
				  u.date_registered data_cadastro, 
				  u.date_last_login ultimo_login, 
				  u.email email, 
				  u.country pais
   from users u,
        user_settings us
   where u.user_id not in (select user_id from user_user_groups)
     and us.user_id = u.user_id
	 and us.setting_name = 'givenName'
order by date_last_login

And to get the user roles I’m using something like this:

select distinct u.user_id, 
              u.username username, 
              us.setting_value nome,
			  u.date_registered data_cadastro, 
			  u.date_last_login ultimo_login, 
			  u.email email, 
			  u.country pais,
			  ug.context_id  id_revista,
			  js.setting_value  nome_revista,
			  ugs.setting_value papel 
from users u,
     user_groups ug,
     user_settings us, 
     user_user_groups uug,
     user_group_settings ugs,
     journal_settings js
where us.user_id = u.user_id
  and uug.user_id = u.user_id 
  and uug.user_group_id = ugs.user_group_id
  and ug.user_group_id = uug.user_group_id
  and js.journal_id = ug.context_id
  and ugs.setting_name = ‘name’
  and ugs.locale in (‘en_US’,‘pt_BR’)
  and us.setting_name = ‘givenName’
  and js.locale = ‘pt_BR’
  and js.setting_name = ‘name’
  and trim(us.setting_value) <> ‘’
  and (lower(ugs.setting_value) like ‘%manager%’ or lower(ugs.setting_value) like ‘%adm%’ )
order by ultimo_login

Maybe not perfect, but worked fine in my tests.

Regards,

Oberdan

We are doing a similar user cleansing, but more manually, and only with users with bounced emails. Depending of your country and the personal data regulations, you may have additional restrictions in deleting personal data.

For example, in Spain, as a UE country with GDPR, we cannot delete users freely, and the data protection officer of our organization told us that we also have to take in consideration the intellectual property laws.

For example, any user that has send any submission at any time cannot be deleted, we can just disable them, just in case they have a legal problem in the future. We established that users that have not send any submission but have participated in any other way in a submission (in a revision, in an editor decision, as copywriter, etc.) can only be deleted if there were not any activity (login session) in the last 5 years, and only if the submission they participated was published more than 4 years ago .

This deadline is not because a legal reason, but because a specific Spanish journal quality evaluation. They ask for the internal documentation of the last 4 issues of the journals, and if the journal is annual, you have to keep it for 4 years, plus 1 additional year just in case the editorial revision was too long. We usually rely on the submission ID, so we set a number for the ID that was published more than 4 years ago, and all the ID involved must have a smaller numbe r.

The users with any submission activity, we can delete them without looking their last login date. But that’s because we only delete the users with bounced emails, so their personal information (email) is wrong and we are allowed to correct wrong information according to GD PR.

The users with editor or manager roles are an exception, so we talk first with the journal editors if they knew them before doing anyth ing.

When a user is disabled (this sound really wrong), the field reasons for disabling the user is really useful, we use as a message to us in the future. For example, the users that must keep disabled forever, or the date when a user can be disabled. We are using OJS 3.3.0.22, maybe other versions work diffe rent.

So, the IT team create a spreadsheet with all the relevant information for all the users of all the journals, and the librarians check for the emails of the users, and make decisions following a decision tree, then the librarians send regularly spreadsheets to the IT team so they can apply the requested changes quicker than manually.

But maybe your country does not have such a strict regulation and you do not have to complicate things as much as we do.

Hi @asmecher - the link to the schema documentation in your post is the same as to pkp-lib/classes/security/Role.php at main · pkp/pkp-lib · GitHub - could you please post the right one?

Hi @mpbraendle,

Good catch, I’ve fixed the link to the schema documentation above. (It’s here.) We’re doing some maintenance on the server that hosts this stuff, and after that’s done we’ll be adding SchemaSpy documentation for 3.5 as well – but the role and user group aspects are consistent across 3.3, 3.4, and 3.5 with only minor differences (e.g. nulls versus 0s for some columns).

Regards,
Alec Smecher
Public Knowledge Project Team

This topic was automatically closed after 10 days. New replies are no longer allowed.