Deleting users from the OJS database

Hi all,

I have a github issue open here: Permit the "merge users" CLI tool to handle multiple users at once · Issue #4772 · pkp/pkp-lib · GitHub

It adds support for merging multiple users at once to the command-line tool, removing the need for a shell script loop. It also adds support for specifying users to be merged by user ID, not just username.

Regards,
Alec Smecher
Public Knowledge Project Team

1 Like

Thank you so much to ctgraham and asmecher. Very useful information!!

this could be is usefull to retrieve a list of users to be merged by the tools/mergeUsers.php .

however it seems like some spammers are registered as authors, so they do have a role.

What we want is to refine the query to list all users who:

  • are authors but dont have a submission, and
  • are not reviewers and dont have any other role in any journal.

The problem is that roles’ table uses an integer to registetr this information in the role_id column. I suspect it is a bitwise encoding, but couldn’t find any documentation about it.

So our query to list suspecious users would be:

select user_id from users where user_id not in
 (
select user_id from roles where role_id = <integer for reviewers, managers, admin, etc..>
UNION
select user_id from roles r, articles a  where r.user_id = a.user_id and r.role_id = <integer for authors> 
)

or something like that…

could anyone help us discover what are the integer codes for the roles mentioned on the query?

regards!

Hi @ronniebrito,

The role IDs are defined in lib/pkp/classes/security/Role.inc.php:

define('ROLE_ID_MANAGER',               0x00000010);
define('ROLE_ID_SITE_ADMIN',            0x00000001);
define('ROLE_ID_SUB_EDITOR',            0x00000011);
define('ROLE_ID_AUTHOR',                0x00010000);
define('ROLE_ID_REVIEWER',              0x00001000);
define('ROLE_ID_ASSISTANT',             0x00001001);
define('ROLE_ID_READER',                0x00100000);
define('ROLE_ID_SUBSCRIPTION_MANAGER',  0x00200000);

(Check your own to make sure you’re looking at the right value! They may vary slightly in different versions of OJS/OMP.)

You can turn these hexadecimal numbers into integers e.g. using PHP:

php -r "echo 0x00200000 . \"\n\";"

Regards,
Alec Smecher
Public Knowledge Project Team

1 Like

Hi,
Did you find a refined query to delete (some) spammers that are registered as authors? We have the same problem.

-Obi

1 Like

thanks @asmecher,

so
is 65536

is 1114112

<integer for reviewers, managers, admin, etc…> is always bellow 65536, except when also readers/authors…

@obi, i think the query could be:

select user_id from users where user_id not in
(
select user_id from roles where role_id < 65536
UNION
select r.user_id from roles r, articles a where r.user_id = a.user_id and (r.role_id = 65536 or r.role_id = 1114112)
)

but did not tested it

Thanks. I will test it this week and report to the forum.

Obi

Hi. I have an OJS portal with a half of million users. 99% of them are spam users.

I made some search via users table and merged about 150000 users. It is not difficult to find all users with “porno” or “casino” in url and merge them.

Now i am searching users in user_settings table. I can get not names, but user ids in this table. Is it possible to use mergeUsers.php with user ids? Any way to convert ids to names? I know not enough about sql, and i need a simple tool to make this cleanup.

select username from ojs.users WHERE user_id in (select user_id from ojs.user_settings WHERE setting_value like '%porno%');

Something like this seems to be good…

1 Like

I tested the query but it is wrong:

select user_id from users where user_id not in
(
select user_id from roles where role_id < 65536
UNION
select r.user_id from roles r, articles a where r.user_id = a.user_id and (r.role_id = 65536 or r.role_id = 1114112)
).

I cannot figure out where you got “articles a”.

hi @obi,

it is an alias for table article , the same is for r and table roles…

regards

I can’t see any table in the database called articles in OJS 3.x

@obi, sorry, I was playing on an OJS2.x database… in OJS 3 not even roles table exist!

on OJS3 you can use tables users and authors to figure out user who are not in authors table…
also user_groups to see if they have other roles (i guess)

I made a username list (in ojs2) by exporting list of URL column and making frequency list (some urls was in 20-30k user profiles) and by exporting usernames wits “stop phrases” in user_settings table.

Now, I deleted about 300K users (via mergeUsers.php, it is enough slow) from this list (only with reader role) and I see about 200K users in my system now. It is no more possible to delete bad users by big groups (there is no stop phrases or bad urls more).

ReCAPTCHA is enough good. It was about 1000 new users per day, and now I see about 10 new users per day.

One more strange moment - more than 90% of bad users was enrolled in one journal (we have 15 journals in our ojs). There was no comments or articles by this authors, but profiles (with porn, casino and other) was in google cache.

Conspiracy theory advocates say this is an “attack” on the journal.

sorry to be so basic, but how do I bring up a command line if on a home pc for this equivalent: https://JournalURL/base/tools/mergeUsers.php

Hi @gail,

That depends on your operating system. Have a look e.g. at websocket - How to run php file through command line? - Stack Overflow.

Regards,
Alec Smecher
Public Knowledge Project Team

  1. I have a php executable in my “path” on windows and on linux.
  2. I make a list of “bad” users by database query ( for example select username from ojs.users WHERE user_id in (select user_id from ojs.user_settings WHERE setting_value like '%porno%');)
  3. I make a batch file in a text editor by replacing “\nbadusername” to “\nphp pathtotools\megreusers.php tempusername badusername”
  4. After this, I just run batch file in the shell (windows or linux) and see a result. Deleting/merging of big (10K+) groups of users is very long procedure.
3 Likes

Hi.

Help me please to make an sql querry, to get a list of users with their registration date, last login date and group. May be, it will be good even to get only users (usernames) with only a “reader” role with registration date = last login date.

I want to delete all “readers only” with registration and last login before 2019.

I allready deleted more than 750K bots, bul last 20-30K are not clearable in batch mode.

Hi. We made a great cleanup of our database. A lot of users was deleted/megred. But it is unclear for me how to cleanup “user_interests” table in right way. We have a lot of allready deleted users (ids) in this table.

May be, mergeUsers.php should be impoved?

Can you advise me if I’m forming the SQL query correctly to delete users from my database?

DELETE FROM users
WHERE id IN (
  SELECT user_id
  FROM comments
  WHERE content LIKE '%casino%'
);

Also, does anyone know how to prevent new spam users from simply registering? I understand how this can be done when they use direct words like “casino” in their spam, but not every link that is spam contains these words.