My site has been attacked and now I has 78,000 spam comments. What's the best way to get rid of them?

btw
I have Mollom module installed now; the spam attack is from before I installed Mollom.

Comments

WorldFallz’s picture

Do you care about keeping any of the comments? Is there some way to tell which are spam programmatically?

Marc Bijl’s picture

Hi, thnx for the reply.

Typical for these comments: strange author, strange subject, body with loads of links. E.g.

  • Author: vicodin-and-aleve
  • E-mail: vicodinand58@gmail.com
  • Website: http://www.webjam.com/danarebewcate024/$my_blog/2010/03/25/vicodin_and_aleve__vicodin_and_sex__vicodin_and_breastfeeding
  • Subject: anHSiRetJOwXzUcyMr
  • Body:
    <a href="http://www.webjam.com/danarebewcate024/$my_blog/2010/03/25/vicodin_and_aleve__vicodin_and_sex__vicodin_and_breastfeeding">vicodin and aleve</a> or <a href="http://www.webjam.com/danarebewcate024/$my_blog/2010/03/25/order_vicodin_legally__no_perscription_vicodin__other_names_for_vicodin">order vicodin legally</a> or <a href="http://www.webjam.com/danarebewcate024/$my_blog/2010/03/25/is_lortab_stronger_or_vicodin__is_vicodin_safe_for_dog__vicodin_dependency">is lortab stronger or vicodin</a> or <a href="http://www.webjam.com/danarebewcate024/$my_blog/2010/03/25/online_doctors_prescribe_vicodin__what_is_in_vicodin__morphine_and_vicodin">online doctors prescribe vicodin</a> or <a href="http://www.webjam.com/danarebewcate024/$my_blog/2010/03/25/vicodin_while_pregnant__2410_vicodin__how_to_detox_vicodin">vicodin while pregnant</a>

I don't care about keeping any of the comments; I'm still building the site, so all comments can be deleted.

Of course it's easy to empty the comment table, but I need to be shore there's no related data in other tables; I want to have it as clean as possible ;)

WorldFallz’s picture

Marc Bijl’s picture

Thnx mate ;)

I've seen that one before.
You're sure no other tables have data related to comments?

WorldFallz’s picture

don't think so, but can't swear to it.

videowhisper’s picture

From phpmyadmin run this query on that database to remove all comments:
TRUNCATE TABLE `comments`;

You can also remove other data in similar way as explained on:
http://www.videoconference-software.com/content/delete-all-spam-and-clea...

ralph.mueller.de’s picture

Very helpful, thanks!

doomed’s picture

On Drupal 7, the table is called "comment", not "comments".

pkosenko’s picture

You can do this from the MySQL command line. You should probably take the site off line. It involves 3 database tables.

Remember that you need to do the DELETE of items in the comment table last, since the first two deletes have joins that depend on data in the comment table.

mysql> USE site_database;

mysql> DELETE FROM field_data_comment_body
-> USING field_data_comment_body
-> JOIN comment
-> ON comment.cid=field_data_comment_body.entity_id
-> AND comment.status=0;
Query OK, 94581 rows affected (1 min 5.68 sec) <-- UNAPPROVED COMMENTS (mostly SPAM)

mysql> DELETE FROM field_revision_comment_body
-> USING field_revision_comment_body
-> JOIN comment
-> ON comment.cid=field_revision_comment_body.entity_id
-> AND comment.status=0;
Query OK, 94581 rows affected (57.67 sec) <-- UNAPPROVED COMMENTS (mostly SPAM)

mysql> DELETE FROM comment WHERE status=0;
Query OK, 94581 rows affected (30.90 sec) <-- UNAPPROVED COMMENTS (mostly SPAM)

mysql> exit
Bye

Remember to use Drush to clear all caches. Change directory (cd) back to wherever the root of your Drupal installation is if needed.

$ drush cc all
'all' cache was cleared.

Jaypan’s picture

Directly deleting stuff from the database is almost never a good idea in Drupal. With fields and various other modules hooking into processes, this is a good way to leave ghosts/remnants in your system that can cause problems later.

You can use the function comment_delete() if you need to delete comments. This will ensure that everything that belongs to the comment is properly deleted (unless you are using a module that doesn't properly hook into the delete process).

pkosenko’s picture

That's generally true about directly database deleting, but I checked it out very carefully, which is why the deletion contains joins to the two field tables. Other people were doing only the comments table, which left a ton of data in the other two tables.

But I will take a look at the comment_delete() function, since I am fiddling around with writing a drush script to do the deed. Given the SHEER NUMBER of bad spam comments that people want to delete, comment_delete() might not be optimal if it only allows you to delete ONE COMMENT AT A TIME. What is needed is a batch operation that deletes everything that has the approval ("Status") flag set to 0.

Jaypan’s picture

The problem is that while the above may work for your system, someone who has fielded their comment, or has a module that works off comments adding something to the DB, will not have it work on their system. It will leave ghosts.

comment_delete() might not be optimal if it only allows you to delete ONE COMMENT AT A TIME.

It does only delete one comment at a time, but we're also talking about a one-off script. It may take a while to run, but better to ensure you are doing things correctly.

pkosenko’s picture

Okay . . . good warnings to think about . . . but I am not sure how the delete_comments() function knows about any additional module functionality that "may be added to comments". If other modules are borrowing comment ids into their own tables, comment_delete() isn't going to delete them either. Do you have any examples of what you mean?

One thing I did NOT capture was the node_comment_statistics table, which needs to be rebuilt after the deletions. See: https://www.drupal.org/node/137458 So I will also look into the devel module code to see what THAT is doing, since I don't want to have to enable/disable a module to do it.

99,847 spam comments is not something one wants to do "one at a time."

Maybe what is needed is a module project that has a BULK "delete all unapproved comments" button (delete_comments() does NOT allow selection by approval flag, so it actually WON'T work BY ITSELF) and that is vetted by enough other Drupal developers that worrying about orphaned data isn't an issue.

pkosenko’s picture

By the way, are comments "fieldable" entities in Drupal 7, or is that only in Drupal 8? If you can't add fields to Drupal 7 comments, there isn't much to worry about in "missing" other field tables.

Jaypan’s picture

By the way, are comments "fieldable" entities in Drupal 7

Yes: https://www.drupal.org/node/774808

Jaypan’s picture

Note: the function is comment_delete(), not delete_comments().

If other modules are borrowing comment ids into their own tables, comment_delete() isn't going to delete them either. Do you have any examples of what you mean?

They will if they are built properly. Modules can act on hook_comment_delete() to delete anything in their tables when a comment is deleted. Any module that is properly developed will do so.

99,847 spam comments is not something one wants to do "one at a time."

I've done it on a site. Sure it takes a while for the script to run, but again, it's a one-off script, and better to do it the right way.

(delete_comments() does NOT allow selection by approval flag, so it actually WON'T work BY ITSELF)

Well, what you would do is something like this:

$ids = db_query('SELECT id FROM {comments} WHERE flag = :zero', array(':zero'))->fetchCol();
foreach($ids as $id)
{
  comment_delete($id);
}

You would need to come up with a query to pull all the IDS that you want to delete before calling comment_delete() on them.

As for your module idea, any such generic module would have to use comment_delete() otherwise it would leave ghosts in the system, as it would not know what fields or other module tables would have relevant data in them.

WorldFallz’s picture

I couldn't agree with jay more.

99,847 spam comments is not something one wants to do "one at a time."

And how many ghosts or garbage does 99,847 incorrect deletes risk leaving behind??? Perhaps not surfacing related problems for months-- at which point a roll-back would likely be impossible.

I don't usually use the word "never", but cutting corners like this is never a good idea. The smidgen of time you save now risks hours of issues and lost time later. Sorry, but in this case, it's just a no-brainer.

pkosenko’s picture

Actually, the SQL queries that you see in my post DO get rid of ALL the unapproved comments. I have been testing for the past week in a variety of ways. I wouldn't recommend doing it EXCEPT after testing it out first, if you aren't comfortable. Create a test database/site.

I guess you guys need to take a look into the actual drupal CODE, which also uses SQL queries to do things.

At some point you need to get familiar with the underlying SQL and tables in order to start getting comfortable writing Drush scripts and modules.

Set it up and TEST my code . . . before throwing it in the trash.

Final comment.

Jaypan’s picture

Actually, the SQL queries that you see in my post DO get rid of ALL the unapproved comments.

That's not the point. We have not claimed that they don't. The point is that for anyone who may come across this thread, if they have any modules that act upon comments, or if the comments have any fields, this will leave ghosts in the system, and as such, they are better off not using that code, and using comment_delete() instead. Even you yourself found that your method missed out on rebuilding the node_comment_statistics table, whereas using comment_delete() would not have that problem. What else have you missed that you haven't realized yet? What if you realize it months later? We are talking best practices, and directly deleting from the database is most definitely not a best practice.

If you want to use SQL queries to delete the comments, you are more than welcome, your Drupal installation is yours, and you can do whatever you want. But we are making the point so that someone else doesn't come along and follow the code you've written, and leave themselves in a bad spot.

On top of that, how do you know that a module you have installed doesn't change a variable upon comment delete or something like that? Have you checked? If it does, you may not find out for months as WorldFallz mentions.

I wouldn't recommend doing it EXCEPT after testing it out first, if you aren't comfortable. Create a test database/site.

So you'd rather spend time creating test database sites, digging through the database tables, and hope that you've caught everything, than run a script that would take much, much less time to build, and be much more stable in the long run, because the script takes too long to run? Forgive me for having troubles understanding your priorities here.

I guess you guys need to take a look into the actual drupal CODE, which also uses SQL queries to do things.

Read back through my tracker - I know Drupal 7 (and 6) code inside out and backwards. I've contributed multiple modules, and written patches for core, and I am very clear on how the database is built. I am an Acquia certified Drupal grandmaster (their term, not mine), meaning I've achieved all three certifications - back end specialist, front end specialist, and the general certification. Trust me, I know Drupal code, and I know the underlying SQL, and I know these better than 99% of the people out there. Can you say the same?

I think WorldFallz can. Last time I talked to her, she was working on the third of the three certifications. She is one of the few people out there who knows Drupal as well as I do, and often better depending on what we are talking about (we focus on different things).

At some point you need to get familiar with the underlying SQL and tables in order to start getting comfortable writing Drush scripts and modules.

I know the underlying tables. And I write custom Drush scripts for every single site I build. I'm not a module user so much, I use drupal as a framework, and code most of my sites using the Drupal API. When I told you to use comment_delete(), I'd put the whole comment delete script in a Drush script myself - and I still wouldn't directly touch the database.

I'm the guy who writes a module that does stuff to the database. Then I write API functions, like comment_delete() that handle database interactions so that people don't have to directly interact with the database, because that is the Drupal way to do things. It's almost always not a good idea to directly touch the database unless its in a module you have developed that put it into the database in the first place.

To summarize - if you want to use SQL queries to do anything on your site, go ahead. But for anyone else reading this thread, that's probably a bad idea, not just in this case, but in most cases.

WorldFallz’s picture

as usual, +1 to everything Jay said.

And yep-- I'm a 'grandmaster' as well. Last time I checked there were only 65 of us globally.

And just for fun, I also took and passed the Certified Site Builder exam as well. ;-)

Jaypan’s picture

Nice! Congratulations.

WorldFallz’s picture

molto grazie! :-)

Jaypan’s picture

As a final comment to pkosenko, please don't let my comments discourage you from posting code or responding to threads in the future. Your contribution is appreciated, and while this time there is a preferable option to what you posted, in other threads I'm sure your posts can help the community. The goal is to spread information to help each other.

strilok’s picture

there no way to stop spam... coz we dont know where is the hidden link...... we can only ban them if we recognise any spamy content on the forum.
if someone know how to detect spam. please let me know....

Jaypan’s picture

That has nothing to do with the current topic.