<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>SEO WebMonkey &#187; duplicate content</title>
	<atom:link href="http://seowebmonkey.com/stuff/duplicate-content/feed/" rel="self" type="application/rss+xml" />
	<link>http://seowebmonkey.com</link>
	<description>Web design &#38; development with an ample sprinkle of SEO</description>
	<lastBuildDate>Tue, 01 Jun 2010 09:03:32 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<xhtml:meta xmlns:xhtml="http://www.w3.org/1999/xhtml" name="robots" content="noindex" />
		<item>
		<title>rel=canonical: trying to get Google to understand</title>
		<link>http://seowebmonkey.com/rel-canonical-trying-google-understand/</link>
		<comments>http://seowebmonkey.com/rel-canonical-trying-google-understand/#comments</comments>
		<pubDate>Tue, 14 Apr 2009 20:49:41 +0000</pubDate>
		<dc:creator>ndixon</dc:creator>
				<category><![CDATA[Content Management]]></category>
		<category><![CDATA[Optimisation]]></category>
		<category><![CDATA[canonical]]></category>
		<category><![CDATA[duplicate content]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[SEO]]></category>
		<category><![CDATA[Wordpress]]></category>

		<guid isPermaLink="false">http://seowebmonkey.com/?p=120</guid>
		<description><![CDATA[I wrote a few days ago about how <a href="http://seowebmonkey.com/duplicate-content-web-content-applications/">duplicate content</a> can damage your site's visibility in search results. This first fix comes in the form of a simple line in your page header.]]></description>
			<content:encoded><![CDATA[<p>Back in February, <a rel="nofollow" href="http://googlewebmastercentral.blogspot.com/2009/02/specify-your-canonical.html">Google announced</a> it had begun supporting the rel=canonical hint in determining the definitive URL for an item of content. This sounds like a one-stop solution to all our duplicate content problems.</p>
<p>For example, my previous post can be reached by the full URL, but also (because of the Wordpress system defaults) via</p>
<pre>http://seowebmonkey.com/?p=113</pre>
<p>If someone decides to link to my post using that URL,  it might therefore also enter the search index in addition to the full version &#8211; thus duplicate content.</p>
<p>Placing a <em>rel=canonical</em> instruction in your page&#8217;s header can resolve this problem before it occurs. It tells the search engine which URL to use as the definitive (canonical) URL for that page of content, regardless of what was used to get there.</p>
<p>Here&#8217;s an example for the above page:</p>
<pre style="font-size:0.9em;">&lt;link rel="canonical" href="http://seowebmonkey.com/duplicate-content-web-applications/"/&gt;</pre>
<p>This sits within the &lt;head&gt; section of your page&#8217;s html.</p>
<p>This link-tag is supported by Google, Yahoo, Ask.com and Microsoft Live Search.</p>
<h2>Does it work?</h2>
<p>Google describes its support of rel=canonical as a &#8220;hint&#8221;. This means it will use the information to determine a canonical URL, but reserves the right to do what it wants when it feels like it. This seems to be a way to cover for errors that slip through the net. Search engines are rarely predictable, and no single method should be trusted in avoiding duplicate content.</p>
<p>For all content management systems in particular, this is an essential addition to the page output. How rapidly Google will change any existing duplicative content URLs that are already in its index is yet to be clearly determined, and this alone will not enable webmasters to explicitly request removal of duplicative URLs via the Google Webmaster Tools interface.</p>
<p>Direct removal of pages from Google&#8217;s search index will be covered in my next post.</p>
]]></content:encoded>
			<wfw:commentRss>http://seowebmonkey.com/rel-canonical-trying-google-understand/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Duplicate content in web applications</title>
		<link>http://seowebmonkey.com/duplicate-content-web-content-applications/</link>
		<comments>http://seowebmonkey.com/duplicate-content-web-content-applications/#comments</comments>
		<pubDate>Fri, 10 Apr 2009 10:37:05 +0000</pubDate>
		<dc:creator>ndixon</dc:creator>
				<category><![CDATA[Content Management]]></category>
		<category><![CDATA[Optimisation]]></category>
		<category><![CDATA[business]]></category>
		<category><![CDATA[domains]]></category>
		<category><![CDATA[duplicate content]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[index]]></category>
		<category><![CDATA[SERPs]]></category>
		<category><![CDATA[spider]]></category>
		<category><![CDATA[web developer]]></category>

		<guid isPermaLink="false">http://seowebmonkey.com/?p=113</guid>
		<description><![CDATA[Arguably, the most prevalent blockage to strong visibility within search results is created by inadvertent duplicated pages of content. 
]]></description>
			<content:encoded><![CDATA[<p>When first approaching a new SEO project, one of the very first research tasks is to assess the current status of the client site within the search engines &#8211; particularly, of course, Google. Right at the top of such tasks is a hunt for duplicate content and consequently, pages that have been filtered into the Supplementary Index.</p>
<h2>Defining duplicate content</h2>
<p>A search engine considers every unique URL to be the location of a single web page. It learns about such URLs by following links. Duplicate content occurs when a search engine follows more than one unique URLs that take it to a page of predominantly similar content &#8211; these pages do not need to be identical, just very similar (a defined ratio is something of a holy grail for optimisers and not clearly established).</p>
<p>Why is this important? A search engine will not display more than two pages from a website for a particular search. It makes an automated decision of which pages are the authoritative or originating version of that content, tucking the rest away at the end of the results: the Supplementary Index.</p>
<p>The damage to SEO comes from a number of factors, the two main causes being:</p>
<ol>
<li>The page you most want to surface may become inadvertently pushed into the Supplementary Index, thus giving searchers an inferior, poorly converting page to visit.</li>
<li>A major factor in search result visibility is the number (and quality) of links pointing to a page, but multiple URLs to a page frequently dilutes the combined effectiveness of back-links as the links point to different unique URLs. </li>
</ol>
<p>From my direct experience with sites over the past year in particular, duplicate content pages result in a cumulative deterioration of the entire site&#8217;s visibility in SERPs (Search Engine Result Pages). </p>
<p>If you have a site that is failing to rank well for even the least competitive search terms, such duplicate content is likely to be a contributing factor, and you will struggle to establish a firm footing until this problem is resolved.</p>
<h2>Spotting the damage</h2>
<p>Discovering duplicate content is relatively straightforward. Perform a search for your website on Google, like this:</p>
<blockquote>
<p style="text-align: center;"><em>site:yourdomain.com keyword</em></p>
</blockquote>
<p>where <em>yourdomain.com</em> is your website&#8217;s domain, and <em>keyword</em> any terms for which you are trying to be visible. Run down to the very end of the results and if you see a message like this:</p>
<p style="text-align: center;"><a href="http://seowebmonkey.com/wp-content/uploads/2009/04/supplimentary-index-message.png"><img class="aligncenter size-full wp-image-115" title="supplimentary-index-message" src="http://seowebmonkey.com/wp-content/uploads/2009/04/supplimentary-index-message.png" alt="supplimentary-index-message" width="502" height="29" /></a></p>
<p>you have duplicate content!</p>
<h2>The cause</h2>
<h4>Content Management</h4>
<p>If you have a duplicate content issue, then I will bet your website runs on some form of content management. Blog applications are notoriously prone to these problems, with supplementary listing pages such as category summaries, tag summaries, archive lists, and search results, all in danger of generating pages with very similar content.</p>
<p>In addition, some content systems do not chose a single means of generating a URL for a item of content, linking to pages and posts in slightly different ways from different parts of the system. They may also fail to redirect all URLs to the canonical URL for that page or post.</p>
<h4>Identical meta data</h4>
<p>In addition, it is very common to come across sites with many &#8211; sometimes even all &#8211; their pages with identical titles and META descriptions . This is particularly prevalent once again with blogs, and also in the small business space where a company feels it must apply its company name and corporate blurb at the top of every page, thus not accurately reflecting that page&#8217;s actual content.</p>
<p>Page titles are very important in establishing the context of a page, and must be unique in order for the search engine to properly understand the page content.</p>
<h4>Too little unique textual content</h4>
<p>Pages with little unique text content can become regarded as duplicative because the majority of the content there is similar to everywhere else on the site. In these sparse content examples, the site-wide navigation, footer information, and other generic text, can form the majority of the textual content.</p>
<h4>Search, archive and summary pages</h4>
<p>As mentioned above, pages that summarise and list snippets of other content can easily appear to a search engine to be very similar but have unique URLs.</p>
<h4>www and non-www domains</h4>
<p>This one often surprises web developers, but www.yoursite.com is, to a search engine, a different website to yoursite.com. This means that if all your content is reachable via both those versions, your entire site is seen as being a duplicate!</p>
<p>It does not matter which you choose, but have one of those URL versions permanently redirected to the other.</p>
<h2>Repairing the damage</h2>
<p>Now that I have covered most of the indicators and causes, how about a means of fixing the problem? Watch out in a few days for specific techniques to repair and remove duplicated content within search engines.</p>
]]></content:encoded>
			<wfw:commentRss>http://seowebmonkey.com/duplicate-content-web-content-applications/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Overcoming duplicate content filters &#8211; back-links are everything</title>
		<link>http://seowebmonkey.com/duplicate-content-filters-backlinks/</link>
		<comments>http://seowebmonkey.com/duplicate-content-filters-backlinks/#comments</comments>
		<pubDate>Mon, 29 Dec 2008 11:34:39 +0000</pubDate>
		<dc:creator>ndixon</dc:creator>
				<category><![CDATA[Google]]></category>
		<category><![CDATA[Link building]]></category>
		<category><![CDATA[authority]]></category>
		<category><![CDATA[backlinks]]></category>
		<category><![CDATA[duplicate content]]></category>
		<category><![CDATA[experiment]]></category>

		<guid isPermaLink="false">http://seowebmonkey.com/?p=86</guid>
		<description><![CDATA[Gaining Google visibility when your website content solely duplicates existing content can be tough. Here is one method of overcoming the duplicate competition.]]></description>
			<content:encoded><![CDATA[<p>Now I bet you are wondering &#8220;Why on earth would you want to duplicate content that already exists?&#8221; Well, putting the darker flavours of SEO/SEM to one side, many sites in the web2.0 world  aggregate content legitimately scraped from other sites. So to understand how the duplicate content filter can be overcome, I thought a little experiment was in order.</p>
<h2>The Duplicate Content Filter</h2>
<p>Google dislikes presenting multiple search results that contain the same content. Instead, it decides which is the most authoritative original source of that content from all the duplicates it has in its index, and presents just that one. The rest are filtered into the supplementary index &#8211; all that extra content you can see when you see something like this at the end of your search results:</p>
<blockquote><p><em>In order to show you the most relevant results, we have omitted some entries very similar to the 1 already displayed.<br />
If you like, you can <span style="color: #3366ff;"><span style="text-decoration: underline;">repeat the search with the omitted results included</span></span>.</em></p></blockquote>
<h2>An experiment to become the authority</h2>
<p>I wanted to test the power of backlinks as an indicator of authority above all else. The outline of the experiment is straightforward:</p>
<ol>
<li>Create a brand new website</li>
<li>Fill it with content that already exists on other, more established websites</li>
<li>Create back-links pointing to it</li>
<li>Do not market the site in any other way</li>
</ol>
<p><em>I am not going to link to the site itself here, as I am now isolating it from normal, organic link targets to perform another experiment.</em></p>
<p>Content for the site was selected from some freely available PLR (Public Label Rights) articles which can be legally reproduced. Such articles are also generally already published elsewhere. In my case, a search for specific chunks of article texts showed most articles had already been published across 6-10 other websites, some new, some quite established.</p>
<h4>Getting indexed</h4>
<p>Adding a link to the footer of a very healthy blog got the brand new domain added to the Google index within 24 hours.</p>
<h4>Building back-links</h4>
<p>In addition to the main purpose of this test, I also used it to try out <a title="Link building with Linkvana" href="http://seowebmonkey.com/go/linkvana">link building service Linkvana</a>. A full review of my experience with Linkvana will be here soon, but in a nutsheell, it provides the ability to create unique backlinks from a plethora of specially managed blogs, but without the potentially damaging drawbacks of usinga link-farm.</p>
<div align="center"><a target="_blank" href="http://bit.ly/lv5day"><img border="0" src="http://www.linkvana.com/images/affiliatetools/Linkvana468x60-trial.jpg" alt="Click here for Linkvana" width="468" height="60" /></a></div>
<p>Over a period of three weeks I used Linkvana exclusively to create just 15 back-links into the new content, both deep-linking and to the home page.</p>
<h4>The outcome</h4>
<p>After just a week of link-building, searching for specific chunks of my published PLR text returned my site <strong>at the top of the search results</strong>, with all the other pages containing the same PLR article, pushed into the supplemental index as duplicate content. This despite all the other sites having the advantage of greater domain age and having already published that content.</p>
<h2>The conclusion</h2>
<p>Duplicate content is one of the most discussed, and misunderstood, aspects of Google&#8217;s search alorhythm, but can be overcome with pure link bulding.</p>
<p>The number of quality links pointing at duplicate content, seems to be the primary metric for assessing that site&#8217;s authority. Of course, this assumption must be tempered against any existing authority held by other sites with the same content.</p>
]]></content:encoded>
			<wfw:commentRss>http://seowebmonkey.com/duplicate-content-filters-backlinks/feed/</wfw:commentRss>
		<slash:comments>13</slash:comments>
		</item>
	</channel>
</rss>
