I've been trying to figure out how to cut down on spam on Drupal.org. I'm working on an automated solution, but in the meantime, I've come up with some queries that give us a good number of spammers to ban.
(Note: all queries run against a dev site that's a few months old, so numbers may not be accurate)
For instance,
select distinct pv.uid from profile_values pv join users u on u.uid = pv.uid where pv.value like "%viagra%" and u.status=1;
gives us 51 spammers.
select distinct pv.uid from profile_values pv join users u on u.uid = pv.uid where pv.value like "%ugg%" and u.status=1;
gives us a whopping 852 spammers.
select distinct pv.uid from profile_values pv join users u on u.uid = pv.uid where pv.value like "%porn%" and u.status=1;
...345 spammers.
select distinct pv.uid from profile_values pv join users u on u.uid = pv.uid where pv.value like "%sex%" and u.status=1;
...874 spammers.
select distinct pv.uid from profile_values pv join users u on u.uid = pv.uid where pv.value like "%cialis%" and u.status=1;
...961 spammers.
select distinct pv.uid from profile_values pv join users u on u.uid = pv.uid where pv.value like "%levitra%" and u.status=1;
...14 spammers (evidently, this is not so popular anymore)
select distinct pv.uid from profile_values pv join users u on u.uid = pv.uid where pv.value like "%phentermine%" and u.status=1;
...26 spammers (nor is this one)
select distinct pv.uid from profile_values pv join users u on u.uid = pv.uid where pv.value like "%prescription%" and u.status=1;
...61 spammers.
select distinct pv.uid from profile_values pv join users u on u.uid = pv.uid where pv.value like "%pharmacy%" and u.status=1;
...73 spammers.
select distinct pv.uid from profile_values pv join users u on u.uid = pv.uid where pv.value like "%slots%" and u.status=1;
...45 spammers.
select distinct pv.uid from profile_values pv join users u on u.uid = pv.uid where pv.value like "%bingo%" and u.status=1;
...56 spammers.
select distinct pv.uid from profile_values pv join users u on u.uid = pv.uid where pv.value like "%poker%" and u.status=1;
...186 spammers.
select distinct pv.uid from profile_values pv join users u on u.uid = pv.uid where pv.value like "%casino%" and u.status=1;
...193 spammers.
select uid,comment from comments where comment like "%dofollow%";
This only gives 2, but I've seen the dofollow "trick" a lot on Drupal.org - in fact, the automated solution that I'm working on will look for that, as it's a good indication that the commenter is a spammer.
Altogether: 3,739 spammers (though they may not all be unique).
Have fun :)
Comments
Comment #1
mikey_p commentedThese queries don't filter on specific field in profile_values and a few of these could conceivably overlap in other fields such as industries worked, companies, etc. Some of these could easily be legitimate interests of users that aren't spammers. (Just search d.o for poker, there's even a project application for a drupal poker module.) Because of that it'd probably be best to also query on date of last login, number of posts/comments, etc or manually review the results from these queries.
For example check for users that have never logged in, with u.login = 0 or users that have not accepted git consent (very unlikely that anyone that has is a spammer) with u.git_consent = 1.
Comment #2
cweagansNew query:
For your reference, here's the fields I'm looking at (basically anywhere on the user profile settings where there's a textarea):
Returns 1749 rows on the spam dev site.
Comment #3
killes@www.drop.org commentedwe blocked a couple 10k of these.