
Published by on December 12th, 2008 Leave a comment »
Many bloggers have experienced their content being legitimately syndicated onto other sites and permit it to happen as part of their promotion. But many have also experienced the scrapers: sites that illegally duplicate entire blog contents, replicating new and old posts in order to populate their sites with content. A Wordpres plugin helps us fight back.
I recently had this issue from my main, personal blog. A less than ethical site (I’m not giving it the benefit of a link) had scraped every word from my RSS feed, and replicated it in a vBulletin forum as individual posts. (Intriguingly, every link in those posts was replaced by a message to register for the forum in order to see the link.)
If you want content, I’ll give you content
This was the first instance of all out scraping I had experienced, so also was the first opportunity to try out a Wordpress plugin that has been collecting dust for some time. Antileech is a plugin by Owen Winkler that replaces the content within your RSS feed – and posts – with a definable or generated message, but only for user-agents or IP addresses you specify.
The plugin adds a small, non-intrusive image to your RSS feed content that enables it to record the location and user-agent of anyone accessing the feed. These user-agents are then listed in the plugin’s settings page, each with a check-box for you to select which ones are to receive the alternative content. The default is to send normal content to all user-agents, so who sees what, is entirely under your control.
Some bloggers choose the default generated message – that encourages anyone reading the alternative content to visit the originating site – while others choose their own messages – some including profanity and obscene/illegal messages to increase the chances of the scraper site being shut down due to inappropriate content. I have chosen a simple message that clearly states the site is stealing content from elsewhere. Most scrapers are automated and once the site owner has set the RSS scraper to work, they rarely look at the incoming content again.
The plugin does not prevent your content from being scraped, of course, and some has to be scraped for you to discover the illegal site in the first place. But once discovered, it offers a very simple and immediate means of ensuring they get no further benefit from future posts from your site. Meanwhile, you can take measures with the site’s hosting company to formally make a complaint to have your stolen content removed.
Another plugin helps to detect scrapers. There are several that do similar jobs, but I choose to use Digital Fingerprint by Kirk Montgomery. This places a user defined string of text at the end of the first paragraph of each post in your RSS feed. Make this string unique and you can create a Google Alert for that particular string. The Alert will let you know whenever and wherever that string turns up. Most will be legitimate syndication, but now and then, you’ll probably discover someone is up to no good.
Scraper sites (or “splogs”) are never going to be eradicated, and their numbers are growing. Tools like these offer the blogger a viable and effective means to retaliate without losing hours scouring the net or duplicates of their content and struggling to contact those responsible to have the content removed.
COMMENT APPROVAL POLICY: Please use a genuine name and email address for your comment. Please use your real name, not SEO keyword text. Please limit any outgoing links in your comment to a maximum of ONE, which should not be the same as you entered URL in the form. Please be considerate to other commenters. Please be relevant to the blog post and contribute to the discussion. Blatant link generation comments (we get a lot of those!) will be deleted. LICENSE By submitting a comment here you grant this site a perpetual license to reproduce your words and name/web site in attribution. Your comment may be edited or removed by a site admin if deemed necessary.