Questions often arise about duplicate content. What is it? How much content needs to be the same before it’s considered duplicate? Will having duplicate content hurt your site and is there a penalty associated with it? All good questions and sadly there isn’t any real concrete evidence for some of the answers. While I always recommend having unique and original content, I think much about duplicate content is a bit overblown.
A definition of duplicate content would seem obvious, but I don’t think it’s so clear cut. It depends on what is considered content. Is it just the words in the center of the page or is your header and navigation taken into account as well. Also how much duplication is needed before something is considered duplicate by a search engine. 100% sure, 80% maybe, 30% probably not.
I’m of the opinion that the whole page is taken into account when looking for duplicate content. If not the whole page then certainly more than the main content are. I think all of the code is or at least much of it is taken into account. There are simply far too many good reasons to use the same content on different sites. It’s perfectly valid for more than one site to pull in the same news and blog feeds or to use syndicated articles and if only the words themselves were taken into account there would be a lot of duplicate pages. However if you also take into account the code around those articles and feeds they’re no longer 100% the same. The consensus online seems to be anywhere from 15% to 30% unique content and you’re fine. I think the page structure outside the main content will generally be enough to get that 30% which is why the same content on different sites often won’t be considered duplicate.
Now if you were to reuse the same content on the same domain then yes it’s duplicate and there’s every reason to believe that either one or both of those pages will be dropped from the index. Mirrored sites would also come across as duplicate given the reuse of everything. Shopping sites where product pages are served in several places with only the URL changed depending on the current category have suffered from appearing as duplicate content. Having two pages on your site delivering the same content with only a few keywords changed would be duplicate. Take those same two pages though and put them on different site and I think the situation is very different.
Syndicated articles give concern to webmasters since on the surface the same article in several places would seem to be 100% duplication. Taking the page beyond the content itself though and it’s probably nowhere near close to being exact. When I started this blog it was naturally a pretty empty place. I decided to download a number of articles just to have something here at first. Since the beginning several of those articles have brought search traffic and continue to do so. Where’s the penalty? The feed for this blog can be pulled into any site and while I’m not vain enough to think it’s happening often I know it has been incorporated into other sites. Yet the pages here are indexed and many rank well for a variety of search terms.
If you’re concerned though about possibly serving duplicate content there are things you can do. Start by writing unique page titles. We all know (or should know) that page titles are important to search engines. Here’s another example of their importance. Add some copy above and below the article. You can’t change the article itself, but you can add some introductory words and your own conclusion.
So what happens when two pages are considered duplicates of each other. Again there’s a lot of disagreement, but the general thought is one of several things. In minor cases some of the dupe pages will simply be filtered out of top results and receive a lower rank than the page the search engine considers the original. In other cases all but the original page may be dropped forn the index or never make it into the index in the first place. The harshest case is where none of the pages remain in the index. If this last case were true though then by simply mirroring a competitors site you could get the site removed from search engines, which I don’t think likely.
Now where does this leave us. Certainly unique and original content is better to have and if you do have duplicate pages on your site then those pages may not only achieve poor ranking, but may also disappear from the index. Truth though is it’s probably a lot more difficult to have two pages considered duplicate than it first seems which is why I think it’s all a little overblown. Duplicate pages certainly do exist and they are to be avoided, but many pages that would seem duplicates on the surface simply aren’t when you take a closer look. If you’re worried though make sure to add enough unique content to the page, so it’s no longer looked at as duplicate content.