What is duplicate content? What does Google do to pages and sites with duplicate content? Is there a penalty for having duplicate content?
Earlier this week the Google Webmaster Central Blog answered these and other questions about duplicate content. If you remember I had my own problems with my posts here going supplemental as a result of the WordPress feed for each post being seen as duplicate content.
What Is Duplicate Content?
Duplicate content generally refers to substantive blocks of content within or across domains that either completely match other content or are appreciably similar.
They also mention that most of the time they realize duplicate content is not malicious in nature, though obviously in some cases it is designed to manipulate rankings. I think it’s important to note that Google refers to ‘blocks of content’ instead of entire pages and also that duplicate can mean ‘appreciably similar.’
They additionally point out that you should not worry about using occasional snippets being flagged as duplicate content. So the quote I’ve taken above from their blog should not cause any problems here. Neither should the quotes I’ve pulled further down in this post.
Why Does Google Care About It?
They care for one very simple reason. Well two, but they only address one. Think for a moment as a searcher. How would you feel if the results to your query were a list of 10 identical pages? How would you feel after clicking on one of the links and reading an article you found that the other nine results were all the same article?
You probably wouldn’t be too happy and if it kept happening over and over again with different queries you’d probably use another search engine. There just isn’t any reason to show more than one page with the same content.
The other not explicitly mentioned reasons is that some duplicate content is the work of spammers. Google, like all search engines, does not like to be manipulated. They would prefer to only have quality and relevant pages appear in search results. If spammers are doing things like scraping content it’s of concern to Google.
What Does Google Do About Duplicate Content?
Perhaps the question everyone is most interested in.
This filtering means, for instance, that if your site has articles in “regular” and “printer” versions and neither set is blocked in robots.txt or via a noindex meta tag, we’ll choose one version to list.
Notice the use of the word ‘filtering’ and no mention of a penalty. When Google finds duplicate content they will decide which version of the page to display in results. Matt Cutts said much the same in one of his video posts late this summer.
The Webmaster Central blog uses the example of two pages on your site, but the situation is the same for duplicate content that appears across domains. If for example you write an article and submit it elsewhere you leave it in Google’s hands, which one will appear in SERPs. Likely it will be the site they consider to have the most authority and trust.
In the rare cases in which we perceive that duplicate content may be shown with intent to manipulate our rankings and deceive our users, we’ll also make appropriate adjustments in the indexing and ranking of the sites involved.
No specific mention of a penalty, but it does sound like if they think you are manipulating them your page or site will get penalized. What else could they mean by ‘make appropriate adjustments.’ I think what’s important to take from this is don’t try to use duplicate content in manipulative ways. Remember though, that Google considers most duplicate content to come with no malicious intent. You should know if you’re trying to manipulate things and if you’re not you probably don’t need to be concerned with thoughts of penalties.
…so in the vast majority of cases, the worst thing that’ll befall webmasters is to see the “less desired” version of a page shown in our index.
Not so horrible. The page you may want to show may not be the one they choose to display and the article you downloaded may never show up in results. There are certainly worse things in life. But do keep in mind if you are serving the same or similar pages with different URLs it’s in your best interest to let Google know via robots.txt which page is the one you want indexed.
How Can Webmasters Address Duplicate Content Issues?
The Webmaster Central Blog goes on to list a few things you can do to address the issue.
- Use 301s
- Syndicate carefully
- Minimize boilerplate repetition
- Understand your CMS
And my favorite
- Don’t worry be happy – Don’t fret too much about sites that scrape…Though annoying, it’s highly unlikely that such sites can negatively impact your site’s presence in Google.
I’m sure that will be good news to many webmasters and site owners.
The Webmaster Central Blog goes into a little more detail about each of the above listed points and also mentions a few more so have a look at their post Deftly dealing with duplicate content if you have more questions about how to handle dup content on your site.
Duplicate content is certainly something to be aware of and if you are in any way serving appreciably similar content across pages of your site you should take some simple steps to remedy the situation. But often the situation isn’t as dire as some would have you believe. The serious issues like the ones I was having with this blog are more a result of having a majority of your pages seen as duplicate. If it’s only a few pages it’s most likely not a problem. But do have a look at your content management system should you use one.