The referrer spam in my web stats page has all but ruined my ability to see who is linking to my site, so as I see it, I have two options. 1) Find a solution to referrer spam or 2) Find another way to track backlinks.
Find a solution to referrer spam
A benefit of stopping referral spam is that you will also reduce the bandwidth they consume, which can add up to be a significant amount. In order to stop the ones who, in addition to referral spamming, scrape the site for e-mail addresses, you can add lines to restrict them in .htaccess, essentially maintaining a robots blacklist.
The thing I don’t like about a blacklist solution is that you’re always one step behind. It’s not preventative, it’s endlessly reactionary, forcing you to constantly babysit your logs, adding new sites whenever they show up.
If you’re interested in this solution, here are several links that may prove useful.
How now BrownPau?
Caveat Lector’s spam category (especially Killing referrer spam)
Mark Pilgrim’s treatise of the subject
WebmasterWorld discussion about the issue
Web spider traps
Find another way to track backlinks
I like to know when someone links to my blog and the fundamental function of a search engine is to track links, so it seems like there would be a simple solution but I haven’t found one. I’ve already written about ways to track links to your site, but those typically show the links in blogrolls. What I want to see are new links from people’s blog entries or sites that have recently added a link to mine. For instance, someone recently informed me that the Gmail help site had a link to my site. I would have liked to have known about that myself instead of relying on someone else to tell me about it.
Here’s what I’ve come up with. I use Bloglines to subscribe to a Technorati and Feedster searches for links to my site. I’d like to do the same with other sites, but those are the only two that have RSS feeds for the results.
Using this solution, if a site in the Technorati or Feedster database links to my site, I will see it in my Bloglines subscription. The only way to use the other sites, at least for now, is manually.
One problem I’ve found (which is the same problem that makes Google not-so-useful in this regard) is that internal links aren’t filtered, so you have to wade through links from your own site.
The sites below show links to my site. To track links to your own site, change dan.hersam.com to your domain in the search.