How do I combat SPAM?

Re: the user role question: registrants may be auto-enrolling as authors or readers. I may be able to provide a sql query that gets these. Any idea whether people are mostly registering as authors, or readers, or both?

James

Both, I think.

+---------+----------+
| role_id | COUNT(1) |
+---------+----------+
|       1 |        1 |
|      16 |        2 |
|     256 |        3 |
|     512 |        3 |
|     768 |        1 |
|    4096 |       43 |
|   65536 |     6360 |
| 1048576 |     6818 |
+---------+----------+

Other good heuristic:

delete from users where last_name LIKE CONCAT(first_name, "%");

Hi @ambs,

Watch out for common names like Sven Svensson.

Regards,
Alec Smecher
Public Knowledge Project Team

Great instructions here, thanks! I just removed 6000 spam accounts.

Maybe OJS could have a inbuilt honeypot field? I mean just add a text field hidden with CSS to the registration form and a corresponding field to the users table. Then you could just check for users that have some value in that field and be fairly sure it is a bot.

I would like to support ajnyga’s suggestion of having an inbuilt honeypot field. This would very easily and quickly let us locate most of the spam accounts.

Another query that can be useful is one to find the email address domains used the most, which can help identify suspicious domains that have the most spam users.

SELECT substring_index(email, '@', -1) domain, COUNT(*) email_count
FROM users
GROUP BY substring_index(email, '@', -1)

-- If you want to sort as well:
ORDER BY email_count DESC, domain;
1 Like

We created a formHoneypot plugin which tags an existing field or adds a new field as a honeypot. We used the honeypot to directly deny registration rather than to flag it for removal later. I’d be interested in hearing thoughts from @ajnyga and @Eirik_Hanssen regarding why we might want to allow the registration (rather than deny it directly) as we prepare to port this from 2.x. to 3.x.

Also tagging @AlexWreschnig.

I was mainly thinking whether bots will learn not to fill the honeypot field. If the removal is done a bit later, they probably will not catch on, but if the registration fails immediately, they might? This is purely speculative.

(but very nice plugin!)

Bots typically don’t learn… doesn’t mean they won’t start in the future, but it’s been a very safe approach so far?

Yesterday a fellow asked my about how to deal with fake spam users… and I apply the 3 suggested methods:

  • Replace with a better captcha typo (Punktype | dafont.com)
  • Set “require_validation = On” and “validation_timeout = 20” in config.inc.php
  • And install @ctgraham 's formHoneypot plugin (really smart solution. thanks a lot!)

No spam users in the last 24 hours.
No need to put my users working for Google for free (as you do with “reCaptcha”).

Thanks!

1 Like

Cool Kaitlin!
I’m adding the same SQL but referring to Postgres:

SELECT substring(email from '@.’) as domain, COUNT() email_count
FROM users
GROUP BY 1
order by 2 desc

About HoneyPot:
It worked, there was a substancial reduction in the number of new daily users. However, some spambots user insertions happens because they don’t fill the field of the honeypot. A suggestion is to add a second optional field as honeypot’s field…

About enable account validation:
Enabling account validation, it uses the values of the user’s table fields “date_registered”, “date_validated”, “date_last_login” and “disabled”, am I right? Could be built a SQL based on these fields to remove users?

[OJS3.1] we have 1000s of fake accounts which are made on a certain date and never have logged in a journal. I am not an expert. Can I use:

SELECT * FROM users WHERE CAST(date_registered AS DATE) = CAST(date_last_login as date)

1 Like

We did it slighly different (difference between registering and last login less than 3 sec.) but it worked very efficient (OJS 3.1.2.4):

SELECT * FROM users WHERE user_id NOT IN (SELECT user_id FROM roles) AND (UNIX_TIMESTAMP(date_last_login) - UNIX_TIMESTAMP(date_registered) < 3) and must_change_password = 0

*edit: I forgot the last part which excludes users, especially potential reviewers, created by the administrator but which never logged in themselves.

3 Likes

Thanks you all for this thread. It’s really useful.

I compiled all the sql-snippets in a wiki-page here to keep working together on them:

Feel free to modify or extend.

We have created a php script (spamkiller.php) by copying the codes presented here and then we changed nothing except two reference paths ( file and echo exec).

As it is, this tool only works on one merge at a time, but it can be scripted. An example php script would be:…

We saved our spam list in a text file called “names.txt”. When we run the script using root user (php spamkiller.php) it says …public_html/tools/mergeUsers.php: Permission denied
Are we missing something?

Does the Enable Account Validation feature work the same for accounts registered by the managers? Must the users registered by them validate their account too?

Also, does it clean previously registered users too?

Hi @pmelo, @gobeshona,

This is an older post and OJS has changed significantly since this original post. Can you please create new separate posts outlining your issue please? This will help avoid clutter and keep the forum organized.

Best regards,

Roger
PKP Team