Return spammers username:
01. Phone/Fax number starting by “12345%”
SELECT username FROM users
WHERE phone LIKE "12345%";
False positives: Low.
This is usually safe, but sometimes real users add a fake phone/fax.
02. Users promoting webs in their bio:
SELECT username FROM users
WHERE url LIKE "%porn%"
OR url LIKE "%loans%"
OR url LIKE "%games%"
OR url LIKE "%amoxicillin%"
OR url LIKE "%cheap%"
OR url LIKE "%buy%"
False positives: Middle.
Too much combinations, too short or too generic filters will return a lot of false positives.
03. Registered users with NO role
SELECT username FROM users
WHERE user_id NOT IN (SELECT user_id FROM roles)
False positives: Low.
Would return false positives with the users that used to have a role in the journal but it was revoked.
04. Users that registered and login in less than 3 seconds
SELECT * FROM users
WHERE user_id NOT IN (SELECT user_id FROM roles)
AND (UNIX_TIMESTAMP(date_last_login) - UNIX_TIMESTAMP(date_registered) < 3)
AND must_change_password = 0
False positives: ??
This is supposed to be safe because only robots can go so fast, but I’m getting a lot of false positives.
05. Same name, middle name and surname
SELECT username
FROM users
WHERE last_name LIKE first_name
AND middle_name LIKE first_name
A variant, a little bit more risky would be:
SELECT username
FROM users
WHERE last_name LIKE CONCAT(first_name, "%")
AND middle_name LIKE CONCAT(first_name, "%")
False positives: Low.
You can also try without middle_name, but triple check is safer. The variant can match real users like “Steve Stevenson”
06. Phones with wrong length (11 digits)
SELECT * FROM users
WHERE phone REGEXP "^[0-9]{11}$"
False positives: High
It’s quite common to se real users adding a fake phone/fax or with typing errors.
07. Mailing address with “@”
SELECT username FROM users
WHERE `mailing_address` LIKE '%@%'
False positives: High
This one could fail a lot. As robots, some humans also are confused and add their mail to this field.
08. BBcode [url= signature
SELECT u.username FROM DATABASE.users u
INNER JOIN DATABASE.user_settings us ON u.user_id=us.user_id
WHERE us.setting_name="signature"
AND us.setting_value LIKE "%[url=%";
To identify spam
A. Users from well-known spam domains:
SELECT username FROM users
WHERE email LIKE "%eamale%"
B. List of most used domain names in mail.
SELECT substring_index(email, '@', -1) domain, COUNT(*) email_count
FROM users
GROUP BY substring_index(email, '@', -1)
ORDER BY email_count DESC, domain;
C. To describe user activity in order to identify suspect users (for ojs 2.x):
SELECT
u.user_id,
u.username,
u.email,
u.first_name,
u.middle_name,
u.last_name,
u.url,
u.disabled,
u.date_registered,
u.date_validated,
bit_or(r.role_id) roles,
count(distinct a.article_id) articles,
count(distinct e.article_id) editing,
count(distinct v.review_id) reviews,
group_concat(b.setting_value ORDER BY b.setting_value separator ' ') biography,
group_concat(c.body ORDER BY c.body separator ' ') comments,
group_concat(s.setting_value ORDER BY s.setting_value separator ' ') signature,
count(distinct c.comment_id) comment_count
FROM
users u
left outer join user_settings b on (u.user_id = b.user_id and b.setting_name = 'biography')
left outer join user_settings s on (u.user_id = s.user_id and s.setting_name = 'signature')
left outer join comments c on (u.user_id = c.user_id)
left outer join roles r on (u.user_id = r.user_id)
left outer join articles a on (u.user_id = a.user_id and a.status in (1, 3))
left outer join review_assignments v on (u.user_id = v.reviewer_id)
left outer join edit_assignments e on (u.user_id = e.editor_id)
GROUP BY
u.user_id,
u.username,
u.email,
u.first_name,
u.middle_name,
u.last_name,
u.url,
u.disabled,
u.date_registered,
u.date_validated;
As far as the data model changed, for 3.x the query could be something like:
SELECT
u.user_id,
u.username,
group_concat( DISTINCT(grp.user_group_id) ORDER BY grp.user_group_id separator '|') userGroupIDs,
max(giv.setting_value) givenName,
max(fam.setting_value) familyName,
max(orc.setting_value) orcid,
max(aff.setting_value) affiliation,
u.email,
u.url,
u.disabled,
u.date_registered,
u.date_validated,
max(bio.setting_value) as biography,
max(com.body) comments,
max(sig.setting_value) signature,
count(distinct rev.review_id) reviews_count,
count(distinct com.comment_id) comment_count
FROM
users u
left outer join user_settings giv on (u.user_id = giv.user_id and giv.setting_name = 'givenName')
left outer join user_settings fam on (u.user_id = fam.user_id and fam.setting_name = 'familyName')
left outer join user_settings orc on (u.user_id = orc.user_id and orc.setting_name = 'orcid')
left outer join user_settings aff on (u.user_id = aff.user_id and aff.setting_name = 'affiliation')
left outer join user_settings bio on (u.user_id = bio.user_id and bio.setting_name = 'biography')
left outer join user_settings sig on (u.user_id = sig.user_id and sig.setting_name = 'signature')
left outer join comments com on (u.user_id = com.user_id)
left outer join review_assignments rev on (u.user_id = rev.reviewer_id)
left outer join user_user_groups grp on (u.user_id = grp.user_id)
GROUP BY
u.user_id,
u.username,
u.email,
u.url,
u.disabled,
u.date_registered
LIMIT 10;
Warning: This query mixes a lot of data and form big DBs it query can take a loooong time.
D. Users with an URL in their bio
SELECT username FROM users
WHERE url LIKE IS NOT NUL
Warning: It’s a real users like to add their real urls, but the fact is usually spammers create fake users to promote their webs so this is a good way to find fake users.