I got very little (read: NO) interest when I posted on this topic earlier with a different phrasing. So, here's my second shot at this.
I propose that there is a problem with the ways that program function URLs are written in Drupal, that causes Drupal to be a disproportionate target for trackback and comment spammers.
The problem with comment and trackback spam in Drupal is this: It's too easy to guess the URL for comments and trackbacks.
In Drupal, the link for a node has the form "/node/x", where x is the node id. In fact, you can formulate a lot of Drupal URLs that way; for example, to track-back to x, the URI would be "/node/x/trackback"; to post a comment to x would be "/node/x/comment". So you can see that it would be a trivially easy task to write a script that just walked the node table from top to bottom, trying to post comments.
Which is pretty much what spammers do to my site: They fire up a 'bot to walk my node tree, looking for nodes that are open to comment or accepting trackbacks. I have some evidence that it's different groups of spammers trying to do each thing, but that hardly matters -- what does matter is that computational horsepower and network bandwidth cost these guys so little that they don't even bother to stop trying after six or seven hundred failures -- they just keep on going, like the god damned energizer bunny. For the first sixteen days of August this year, I got well over 100,000 page views, of which over 95% were my 404 error page. The "not found" URL in over 90% of those cases was some variant on a standard Drupal trackback or comment-posting URL.
One way to address this would be to use something other than a sequential integer as the node ID. This is effectively what happens with tools like MoveableType and Wordform/WordPress because they use real words to form the path elements in their URIs -- for example, /archives/2005/07/05/wordform-metadata-for-wordpress/, which links to an article on Shelley Powers's site. Whether those real words correspond to real directories or not is kind of immaterial; the important point is that they're impractically difficult to crank otu iteratively with a simple scripted 'bot. Having to discover the links would probably increase the 'bot's time per transaction by a factor of five or six. Better to focus on vulnerable tools, like Drupal.
But the solution doesn't need to be that literal. What if, instead of a sequential integer, Drupal assigned a Unix timestamp value as a node ID? That would introduce an order of complexity to the node naming scheme that isn't quite as dramatic as that found in MT or WordPress, but is still much, much greater than what we've got now. Unless you post at a ridiculous frequency, it would guarantee unique node IDs. And all at little cost in human readability (since I don't see any evidence that humans address articles or taxonomy terms by ID number, anyway).