SEO WebMonkey

Web design & development with an ample sprinkle of SEO

RSS2.0 Feed
Jamming the scraper signals

Jamming the scraper signals

Many bloggers have experienced their content being legitimately syndicated onto other sites and permit it to happen as part of their promotion. But many have also experienced the scrapers: sites that illegally duplicate entire blog contents, replicating new and old posts in order to populate their sites with content. A Wordpres plugin helps us fight back.

I recently had this issue from my main, personal blog. A less than ethical site (I’m not giving it the benefit of a link) had scraped every word from my RSS feed, and replicated it in a vBulletin forum as individual posts. (Intriguingly, every link in those posts was replaced by a message to register for the forum in order to see the link.)

If you want content, I’ll give you content

This was the first instance of all out scraping I had experienced, so also was the first opportunity to try out a Wordpress plugin that has been collecting dust for some time. Antileech is a plugin by Owen Winkler that replaces the content within your RSS feed – and posts – with a definable or generated message, but only for user-agents or IP addresses you specify.

The plugin adds a small, non-intrusive image to your RSS feed content that enables it to record the location and user-agent of anyone accessing the feed. These user-agents are then listed in the plugin’s settings page, each with a check-box for you to select which ones are to receive the alternative content. The default is to send normal content to all user-agents, so who sees what, is entirely under your control.

The message that counts

Some bloggers choose the default generated message – that encourages anyone reading the alternative content to visit the originating site – while others choose their own messages – some including profanity and obscene/illegal messages to increase the chances of the scraper site being shut down due to inappropriate content. I have chosen a simple message that clearly states the site is stealing content from elsewhere. Most scrapers are automated and once the site owner has set the RSS scraper to work, they rarely look at the incoming content again.

The plugin does not prevent your content from being scraped, of course, and some has to be scraped for you to discover the illegal site in the first place. But once discovered, it offers a very simple and immediate means of ensuring they get no further benefit from future posts from your site. Meanwhile, you can take measures with the site’s hosting company to formally make a complaint to have your stolen content removed.

How to know when you’re being scraped

Another plugin helps to detect scrapers. There are several that do similar jobs, but I choose to use Digital Fingerprint by Kirk Montgomery. This places a user defined string of text at the end of the first paragraph of each post in your RSS feed. Make this string unique and you can create a Google Alert for that particular string. The Alert will let you know whenever and wherever that string turns up. Most will be legitimate syndication, but now and then, you’ll probably discover someone is up to no good.

A never ending battle

Scraper sites (or “splogs”) are never going to be eradicated, and their numbers are growing. Tools like these offer the blogger a viable and effective means to retaliate without losing hours scouring the net or duplicates of their content and struggling to contact those responsible to have the content removed.

Leave a Reply

COMMENT APPROVAL POLICY: Please use a genuine name and email address for your comment. Please use your real name, not SEO keyword text. Please limit any outgoing links in your comment to a maximum of ONE, which should not be the same as you entered URL in the form. Please be considerate to other commenters. Please be relevant to the blog post and contribute to the discussion. Blatant link generation comments (we get a lot of those!) will be deleted. LICENSE By submitting a comment here you grant this site a perpetual license to reproduce your words and name/web site in attribution. Your comment may be edited or removed by a site admin if deemed necessary.

This site attempts to break down personal, practical experience of web development and SEO into easily accessible, digestible articles and information.

Neil Dixon has been involved in web development and SEO since the late 1990s and is currently responsible for SEO for an online media entertainment network.

Views and opinions contained on this site are those of the article author(s) and do not reflect those of any organisation to which they are affiliated.

Search dofollow blogs: