In the past couple of months, I've begun to notice the occasional posting of 'comment spam' on my website. These have tended to include a short string of nonsense "mad-lib" style text, followed by a large number of offsite links. I currently utilize the tracker module to at least glance at every comment left on my website, so I eventually find this spam and manually delete it. However as the rate of this comment spam has increased, I've been looking for a better way to deal with it.
Not wanting to re-invent the wheel, I began by looking at Spamassassin and other free anti-spam tools. I had hoped to integrate one of these tools into Drupal, letting it do the actual work of deciding whether or not a given comment was spam. With further research, I found that this wasn't very workable as these anti-spam tools tended to be very mail-centric, looking at more than just the body of the email. Instead, I read up on using Bayesian logic, and ultimately decided it would be best to write a simple Bayesian filter in PHP.