SEO WebMonkey

Web design & development with an ample sprinkle of SEO

RSS2.0 Feed
Duplicate content in web applications

Duplicate content in web applications

Arguably, the most prevalent blockage to strong visibility within search results is created by inadvertent duplicated pages of content.

When first approaching a new SEO project, one of the very first research tasks is to assess the current status of the client site within the search engines – particularly, of course, Google. Right at the top of such tasks is a hunt for duplicate content and consequently, pages that have been filtered into the Supplementary Index.

Defining duplicate content

A search engine considers every unique URL to be the location of a single web page. It learns about such URLs by following links. Duplicate content occurs when a search engine follows more than one unique URLs that take it to a page of predominantly similar content – these pages do not need to be identical, just very similar (a defined ratio is something of a holy grail for optimisers and not clearly established).

Why is this important? A search engine will not display more than two pages from a website for a particular search. It makes an automated decision of which pages are the authoritative or originating version of that content, tucking the rest away at the end of the results: the Supplementary Index.

The damage to SEO comes from a number of factors, the two main causes being:

  1. The page you most want to surface may become inadvertently pushed into the Supplementary Index, thus giving searchers an inferior, poorly converting page to visit.
  2. A major factor in search result visibility is the number (and quality) of links pointing to a page, but multiple URLs to a page frequently dilutes the combined effectiveness of back-links as the links point to different unique URLs. 

From my direct experience with sites over the past year in particular, duplicate content pages result in a cumulative deterioration of the entire site’s visibility in SERPs (Search Engine Result Pages). 

If you have a site that is failing to rank well for even the least competitive search terms, such duplicate content is likely to be a contributing factor, and you will struggle to establish a firm footing until this problem is resolved.

Spotting the damage

Discovering duplicate content is relatively straightforward. Perform a search for your website on Google, like this:

site:yourdomain.com keyword

where yourdomain.com is your website’s domain, and keyword any terms for which you are trying to be visible. Run down to the very end of the results and if you see a message like this:

supplimentary-index-message

you have duplicate content!

The cause

Content Management

If you have a duplicate content issue, then I will bet your website runs on some form of content management. Blog applications are notoriously prone to these problems, with supplementary listing pages such as category summaries, tag summaries, archive lists, and search results, all in danger of generating pages with very similar content.

In addition, some content systems do not chose a single means of generating a URL for a item of content, linking to pages and posts in slightly different ways from different parts of the system. They may also fail to redirect all URLs to the canonical URL for that page or post.

Identical meta data

In addition, it is very common to come across sites with many – sometimes even all – their pages with identical titles and META descriptions . This is particularly prevalent once again with blogs, and also in the small business space where a company feels it must apply its company name and corporate blurb at the top of every page, thus not accurately reflecting that page’s actual content.

Page titles are very important in establishing the context of a page, and must be unique in order for the search engine to properly understand the page content.

Too little unique textual content

Pages with little unique text content can become regarded as duplicative because the majority of the content there is similar to everywhere else on the site. In these sparse content examples, the site-wide navigation, footer information, and other generic text, can form the majority of the textual content.

Search, archive and summary pages

As mentioned above, pages that summarise and list snippets of other content can easily appear to a search engine to be very similar but have unique URLs.

www and non-www domains

This one often surprises web developers, but www.yoursite.com is, to a search engine, a different website to yoursite.com. This means that if all your content is reachable via both those versions, your entire site is seen as being a duplicate!

It does not matter which you choose, but have one of those URL versions permanently redirected to the other.

Repairing the damage

Now that I have covered most of the indicators and causes, how about a means of fixing the problem? Watch out in a few days for specific techniques to repair and remove duplicated content within search engines.

4 Responses to “Duplicate content in web applications”

  1. Carey says:

    Another consideration for duplicate content is that which appears in open source scripts (or even in house scripts that are widely used).

    Pages such as a Privacy Policy, Contact pages, help pages etc. may have reused content that appear on many other sites across the web.

    • ndixon says:

      In such instances, preventing indexing of this content is vital to ensure they do not inadvertently turn up in search results, and, of course, to avoid duplicate content problems.

      Using the “NOINDEX” Robots meta tag or disalowing indexing in your robots.txt will prevent these pages showing up.

  2. biz says:

    What is the risk of publish articles on isnare and ezine articles? Should I stop publishing them?

    • ndixon says:

      So long as you are not publishing the same content both on article sites and on your website, you’ll be fine. Create unique website content, then more unique articles for distribution across article networks that link in to your original website content.

Leave a Reply

COMMENT APPROVAL POLICY: Please use a genuine name and email address for your comment. Please use your real name, not SEO keyword text. Please limit any outgoing links in your comment to a maximum of ONE, which should not be the same as you entered URL in the form. Please be considerate to other commenters. Please be relevant to the blog post and contribute to the discussion. Blatant link generation comments (we get a lot of those!) will be deleted. LICENSE By submitting a comment here you grant this site a perpetual license to reproduce your words and name/web site in attribution. Your comment may be edited or removed by a site admin if deemed necessary.

This site attempts to break down personal, practical experience of web development and SEO into easily accessible, digestible articles and information.

Neil Dixon has been involved in web development and SEO since the late 1990s and is currently responsible for SEO for an online media entertainment network.

Views and opinions contained on this site are those of the article author(s) and do not reflect those of any organisation to which they are affiliated.

Search dofollow blogs: