Collaborative list of spam user patterns

Whether you’ve inherited a new magazine or it’s time to put your house in order, in the forum you can find several good threads on how to deal with spam users but I miss an unified and updated list of spam patterns like the one @jmacgreg started in How do I combat spam?, so, here you have a forum-wiki-page here with an initial compilation that could be collectively extended/enriched.

I tried to make it compact and with same structure to facilitate the reading and are numbered for an easy reference. I also add a “False positives” field (with an estimation like “high/middle/low” probability) to describe in what situations you can get real users instead of spammers.

If you found a new pattern and you like to share it, when editing, please distinguish between queries that return spam users (with eventually a very few false positives) from the ones that help to identify them (list of users to dig further).

DISCLAIMER: Please, use this list with care because is not bullet prof, and you can get false positives. I’m so sorry I didn’t credit the original authors of those snippets. Kudos for them. You can find their original posts in the forum searching the published queries.

TIPS FOR NEWBIES
To facilitate the exportation to mergeUsers.php script, queries return “username”.
Use phpmyadmin or adminer (or simply redirect the output of the script to a file) to get a list.
Replace “username” with “count()" to get the number of affected users or with "” to get all data from one user.

Some links you would like to read if you are concerned about spam in OJS:

Return spammers username:

01. Phone/Fax number starting by “12345%”

SELECT username FROM users 
WHERE phone LIKE "12345%";

False positives: Low.
This is usually safe, but sometimes real users add a fake phone/fax.

02. Users promoting webs in their bio:

SELECT username FROM users 
WHERE url LIKE "%porn%"
OR url LIKE "%loans%"
OR url LIKE "%games%"
OR url LIKE "%amoxicillin%"
OR url LIKE "%cheap%"
OR url LIKE "%buy%"

False positives: Middle.
Too much combinations, too short or too generic filters will return a lot of false positives.

03. Registered users with NO role

SELECT username FROM users 
WHERE user_id NOT IN (SELECT user_id FROM roles)

False positives: Low.
Would return false positives with the users that used to have a role in the journal but it was revoked.

04. Users that registered and login in less than 3 seconds

SELECT * FROM users 
WHERE user_id NOT IN (SELECT user_id FROM roles) 
AND (UNIX_TIMESTAMP(date_last_login) - UNIX_TIMESTAMP(date_registered) < 3) 
AND must_change_password = 0

False positives: ??
This is supposed to be safe because only robots can go so fast, but I’m getting a lot of false positives.

05. Same name, middle name and surname

SELECT username
FROM users 
WHERE last_name LIKE first_name
AND middle_name LIKE first_name

A variant, a little bit more risky would be:

SELECT username 
FROM users 
WHERE last_name LIKE CONCAT(first_name, "%") 
AND middle_name LIKE CONCAT(first_name, "%") 

False positives: Low.
You can also try without middle_name, but triple check is safer. The variant can match real users like “Steve Stevenson”

06. Phones with wrong length (11 digits)

SELECT * FROM users 
WHERE phone REGEXP "^[0-9]{11}$"

False positives: High
It’s quite common to se real users adding a fake phone/fax or with typing errors.

07. Mailing address with “@”

SELECT username FROM users 
WHERE `mailing_address` LIKE '%@%'

False positives: High
This one could fail a lot. As robots, some humans also are confused and add their mail to this field.


To identify spam

A. Users from well-known spam domains:

SELECT username FROM users 
WHERE email LIKE "%eamale%"

B. List of most used domain names in mail.

SELECT substring_index(email, '@', -1) domain, COUNT(*) email_count
FROM users
GROUP BY substring_index(email, '@', -1)
ORDER BY email_count DESC, domain;

C. To describe user activity in order to identify suspect users (for ojs 2.x):

SELECT
    u.user_id,
    u.username,
    u.email,
    u.first_name,
    u.middle_name,
    u.last_name,
    u.url,
    u.disabled,
    u.date_registered,
    u.date_validated,
    bit_or(r.role_id) roles,
    count(distinct a.article_id) articles,
    count(distinct e.article_id) editing,
    count(distinct v.review_id) reviews,
    group_concat(b.setting_value ORDER BY b.setting_value separator ' ') biography,
    group_concat(c.body ORDER BY c.body separator ' ') comments,
    group_concat(s.setting_value ORDER BY s.setting_value separator ' ') signature,
    count(distinct c.comment_id) comment_count
FROM
    users u
    left outer join user_settings b on (u.user_id = b.user_id and b.setting_name = 'biography')
    left outer join user_settings s on (u.user_id = s.user_id and s.setting_name = 'signature')
    left outer join comments c on (u.user_id = c.user_id)
    left outer join roles r on (u.user_id = r.user_id)
    left outer join articles a on (u.user_id = a.user_id and a.status in (1, 3))
    left outer join review_assignments v on (u.user_id = v.reviewer_id)
    left outer join edit_assignments e on (u.user_id = e.editor_id)
GROUP BY
    u.user_id,
    u.username,
    u.email,
    u.first_name,
    u.middle_name,
    u.last_name,
    u.url,
    u.disabled,
    u.date_registered,
    u.date_validated;

Warning: This query mixes a lot of data and form big DBs it query can take a loooong time.

D. Users with an URL in their bio

SELECT username FROM users 
WHERE url LIKE IS NOT NUL

Warning: It’s a real users like to add their real urls, but the fact is usually spammers create fake users to promote their webs so this is a good way to find fake users.

1 Like