Friday, 30 December 2011

How-to: The Duplicate Content Myth

I came across a post earlier in the year by Panos over on Wordpress Blogger Tips exploding the myth about duplicate content, with particular reference to Wordpress.com serviced sites.
 
"Duplicate content means substantive blocks of material that exist more than once in the web – in other words, substantive blocks of material that can be accessed via more than one URL. For example, if you write a lengthy reply in someone else’s blog or in a forum, and then publish this reply as a post in your own blog (with or without changes), that’s duplicate content."

The myth in question, peddled by “SEO” (Search Engine Optimization) sites, being that duplicate content on their blog is a cardinal sin and the wrath of Google will fall upon them. Quoting the head of Wordpress Support: "I wouldn’t believe anything written on any SEO blog. Ever..."
Panos also checks Google's own guidance:
  1. "Google wants to serve up unique results and does a great job of picking a version of your content to show if your site includes duplication. If you don’t want to worry about sorting through duplication on your site, you can let us worry about it instead."
  2. Duplicate content doesn’t cause your site to be penalized. If duplicate pages are detected, one version will be returned in the search results to ensure variety for searchers.
  3. In the majority of cases, having duplicate content does not have negative effects on your site’s presence in the Google index.
  4. In the vast majority of cases, the worst thing that’ll befall webmasters is to see the “less desired” version of a page shown in our index.
  5. Only when there are signals pointing to deliberate and malicious intent, occurrences of duplicate content might be considered a violation of the webmaster guidelines."
  6. In the rare cases in which we perceive that duplicate content may be shown with intent to manipulate our rankings and deceive our users, we’ll also make appropriate adjustments in the indexing and ranking of the sites involved. However, we prefer to focus on filtering rather than ranking adjustments.
Part of the alarmist myth concerns the way that Wordpress handles Category pages. Wordpress Support again: "You are fine – it’s how all WP blogs work and Google likes the way we do it."

Technically your category and other index pages are duplicate content, because those URL's lead to the same content as the post URL's; but contrary to the myth, categorizing your posts intelligently can improve your standing. Google will simply fetch results for your content using one of the several possible ways, demoting the rest into the supplemental index so that the search shows up a page of different results, not the same article under ten different tags or URL's. The search engine knows it's the same article.

The same isn't always true of self-hosted blogs and other sites where customisation and the individual web master can mess with the SEO friendliness of the site. Wordpress.com blogs are as SEO effective because a) they've been at this a long time, standardising URL's and sitemaps in standard structures and b) the search engines know exactly how they work, with consitency and accuracy.

So long as you don’t systematically post the same articles in different sites, you don’t systematically re-publish older posts, and don’t systematically copy articles from other online sources, then the search engines don't get upset.

I'd go further in that it's the word 'systematic' that's key. My co-authors and I regularly syndicate content from our individual blogs onto Everything Express and it doesn't appear to do us any harm in the search engines. If it did, heavily syndicated freelance writers the world over would be in big trouble. If we set up a link wheel and copied content wholesale across dozens of sites, that would naturally downgrade our Search effectiveness (not to mention fragmenting the audience, the viewing stats being another indicator of rank for search purposes).

To quote Bobby McFerrin, "don't worry, be happy." RC