how to solve the comment spam problem
Anyone that has a blog with comments enabled has probably dealt with the unpleasant experience of checking their e-mail and finding hundreds of comment notifications after being hit by a comment spammer.
A lot of time and effort has been made to resolve this problem and yet the spam continues. I’ve been trying to come up with some ingenious solution that will stop it, but I have yet to come up with anything. I tried looking at it the other way around, so instead of thinking about how to stop it, I thought about what I would like the solution to do for me.
The ideal solution would cost me no time or effort. It would allow all legitimate comments and block all spam comments. Of course that requirement is not complete without a clear definition of what makes a comment legitimate or spam. Some people make a single, manual comment because they want to advertise their web site, and while it’s not nearly as nefarious as many spammers, it’s still spam in my mind. Others would probably say it’s just a regular comment. Having multiple definitions of spam indicates there are several acceptable solutions.
Now we’re left with the issue of how to make it effortless for the blog owner. To me that means blocking it before it ever enters the blog’s data storage, but that also means you run the risk of having valid comments blocked. That’s a big deal because most blog owners appreciate and enjoy receiving comments from their readers.
My solution up to this point has been made possible because of the ability of WordPress to moderate comments based on the number of links. I can then approve those that are valid and delete the rest. Coupled with my slight modification of the moderation page it takes me two clicks to rid my entire blog of comment spam and it never shows up on the site, but that hasn’t curbed the amount of spam I get. Apparently they don’t care that it never shows up on the site, so they continue to do it and I continue to get ticked off by it. It was especially annoying to find that I accidentally deleted a legitimate comment. (sorry Chad) If he hadn’t e-mailed me to ask about it, I never would have known that I had deleted it.
I’m just brainstorming here, but maybe I could ask the commenter a question that only a regular reader would know. If they knew the answer, the comment would be posted immediately. If they didn’t, the comment would be moderated. I’ll have to think about that one. That could be done only for a comment that would otherwise be moderated. There has to be a balance between making it easy for people to comment while at the same time making it hard for spammers to advertise their wares.
Other solutions I have thought of (they were mainly the product of a fit of rage after receiving 250+ comments in one afternoon) include going to the web sites that are in the comment spam and spamming them up the wazoo, but then I realized that traffic is exactly what they’re after, so that wouldn’t be very effective. But maybe you could clog their contact e-mail addresses or web forms. I thought about complaining to their internet service providers or web hosting companies, until I found that many of them are doing it outside of the US and may even be running their own ISPs.
Wouldn’t it be great if we could figure out a way to team up in a full frontal assault on every web site that has ever been used in comment spam? If we could just find a way to make it too much work for them, or not benefit them, they would go away, but as long as they see some benefits from it they will continue to spread their filth.
If you’re a WordPress user, you may be interested in Kitten’s comment spam thoughts where she reviews a few plugins to stop comment spam. There’s also a centralized solution in development which may be effective but we’ll have to wait and see.
I really like the suggestion of asking a reader a question. It’s like a friendlier form of visual verification.
Some comment systems have the ability to enable a ‘captcha’, which is an image of a letter or word with visual noise in it designed to confuse character recognition software. To post your comment, you must enter the captcha key into a field before posting. A little bit of a hassle for legitimate commenters, but I imagine it’s pretty effective.
Another idea that’s less intrusive but also invisible would be to generate a one-time secret key when the page is viewed and fill it into the comment submission form in a hidden field. I’m not sure exactly how comment spam scripts work, but it may stop particularly naive ones that just try to post to the form repeatedly without reading the page.
Maybe I should look into implementing the question solution.
I’ve also seen blogs using the captcha but it’s too much of a hassle to require all users to do that. I really like Levi’s idea of the secret key. It wouldn’t stop the manual commenters, but it might put an end to automated spamming.
I use kitten’s spam words in conjunction with wp-blacklist. I’m sure my site doesn’t get near the hits that yours does, but once I set up the blacklist to delete any messages in the moderation queue, I’ve only had to actually manually delete one or two comments a day.
Plus, it adds a mass editing option for deleting comments that will add reap the comments for more spam words and add them to the list.
Might be something worth looking at.
The main thing I don’t like about using a blacklist is that you’re always playing catchup to the spammers. They will always get the new URLs through before you can add it to your blacklist.
Until I really trust the system, I prefer to review the moderation queue manually instead of having it auto-delete.
Another issue with wp-blacklist is that it not only blacklists the URLs but the e-mail and IP address. Someone could use my e-mail address along with blacklisted URLs and all of a sudden my e-mail address would be blacklisted.
I don’t like that.
Asking the reader an easy (for them) question makes sense and seems to be the method used by a number of big players (eg. yahoo has people look at an image and type in the associated letters into a box before you can create an account). Once someone answers a question once, you could put a cookie on their machine which says that they are an authenticated user and thus there is no reason to make them answer questions in the future (thus you’d have an informal registration system similar to the one that blogger uses – though this would be less intrusive).
Stephen, Yahoo’s image method is a captcha. The captcha-to-cookie idea is good, though… you don’t have to create a formal account, which people seem resistant to doing, but you also don’t have to continue entering captcha keys or answering questions. Still mildly annoying for one-time commenters, but not much of a problem for regulars.
The method I’ve implemented on my brothers’ blogs so far keeps out all automated spam, while causing the least incovenience to non-spammers.
1. Grandfathering: Anyone who has had a comment posted before (as determined by matching name, e-mail address and url to previous approved comments) is presumed to be able to post a new comment unless they contain blacklist words.
2. Blacklist words: Looks in name, url, email address, and comment text for certain character combinations likely to be in spam.
3. If the commenter is new or the comment contains blacklist words, the commenter is asked to confirm that they are not a spammer. (I do this by having them add a text phrase to their comment, which is then stripped out before the comment is posted.)
On my own blog, I use a combination of grandfathering, blacklist words, and comment moderation.
I like your spam the spammers idea. It would be nice to have a little automated utility (perhaps called Spamarang :) that looks for obvious spam attempts on your site and them spams the spammer’s site. I do not believe that is the kind of traffic that spam site would like.
Or we could all put intense pressure on credit card companies not to deal with spammers knowing, that the bad will and publicity would force them to stop working with the spammers and without that there is no point in spamming.
Ok, I guess those are too much work. On my site I think I’ll post a big sign saying “Don’t spam me, go to one of these sites with way more readers and traffic and spam them”. If I can convince the spam robots that there are juicier spam targets out there then perhaps the spam bullies will leave me alone. I suppose the “I don’t have to outrun the bear, I just have to outrun the next guy” mentaility might not be that constructive :-)
I created a key and only allowed comments that had the key but the comment spam kept coming They must use the form somehow instead of going directly to the comment posting script.
Hmm. The spammers may be looking for traffic, but do they want traffic that nobody sees?
Strip the URLs out of comment spam. Add them to a file as the SRC of an image tag. At the very bottom of your page, include an invisble IFRAME calling that file. The result is a hit to their server every time someone visits your blog, but they get no benefit out of it because no one sees their page at all. (By calling their page as an image source, you prevent them from causing popups or other nasty things.)
Sure, one blogger alone doing this won’t make a difference. But if a lot of bloggers did it…
That’s an interesting idea, but they may not be smart enough to realize no one is seeing the traffic and celebrate and their newfound popularity.
I just found out about a plugin that checks various blacklists for the IP of the commentor and denies the comment. We’ll see how well it works. The previous link discusses the code and here is the source code for the plugin.
My blog is running on Drupal software, which allows for a ‘Subject’ field. For some reason, every single spam I’ve got has identical subject and name fields. I just added some logic to the spam module to assume a comment that has identical subject and name fields to be spam, and now I am 100% spam-free. The bonus is that these are still analyzed and stuck in the bayesian filter, so hopefully if the spammers grow wise to this the filter will be well-trained enough to still work well.
I’ve started doing the same thing because some of the comments contain the same text, but it’s only catching 15-20 a day, meaning I still get 20-30.
I’m also confused by the tactics being used on another blog. They’re comment spamming the blog with URLs that don’t exist, using a person’s first and last name. I don’t understand how they can benefit from that.
I’m not sure you caught the meaning I meant to convey, because what you did doesn’t sound like what I did at all. I have two fields, A and B, that when they’re part of a spam comment, always have the same text C in them. The text C varies so much that bayesian training was going incredibly slowly, but it was a universal fact that in a single spam, A = B despite the variability of C between postings.
For now, this is a 100% spam filter. If we’re being hit by the same bots, perhaps you could capitalize on it by introducing a new field and seeing if it gets the same input as another field.
I guess I misunderstood your initial solution. I just found text that was common to about 40% of the comment spam I get and filtered it out. If I had fields that were always identical, I would gladly use that as a 100% spam filter, but I do not have that luxury and I prefer not to add fields. I’m still looking for a better solution.
If they’re just bots maybe they’re just looking for an input->submit element. Maybe you could fool them by using a javascript function to submit the form instead of a ‘submit’ button.
That’s not a bad idea but it would only be a matter of time until they started looking for a javascript function to submit.
On a brighter note, I’ve just figured out how to stop the current rash of comments, and so far it has been 100% effective.
I want to keep it on the down low because if they learn how I did it, they’ll probably try to thwart it, so I don’t want to spill the beans. It’s similar to what I was doing before, but I went a bit further. It’s not likely to be of much use for other sites since they may not be dealing with the same spambot, but if you’re interested post a comment here and I’ll e-mail you what I did.
I don’t know if I’ll understand it, but I’m curious what you did. Thanks