Duplicate content within one domain

Sunday is a good day for catching up on the blog and thorough analysis of errors on your pages. About duplicate content is one of those recently interested me most, and today I found on one of his party errors involving crawling sort results: / So it will be another post in the series “Science for my mistakes” ;-)

  • Introduction
  • Errors causing DC
  • How to block indexing of selected sub-pages?

Introduction

I will start with information from the Google Help:

Google tries hard to index and show only the pages containing a variety of content. The use of such filtering means that if, for example on the site are part of the standard version and printing, and none of them will be blocked in robots.txt or via a meta noindex tag in the search results will be listed only one of them.

That is the theory, and how it presents in practice? As usual, that the exception proves the rule. Without any problem can be found whether the search results or sort, or even a print version, together with major versions of sub-pages with the same or very similar content. It is the exceptions, which until then the “got away” ;-) Unfortunately, at any time can come to that is only one version and not have to be that which the webmaster wants to see there. Itself could be convinced of this recently described the situation (in the homepage and 2 addresses the problem of DC), when it suddenly after several years has been indexed with the address of the index.php (link to this address lost it somewhere on the page), and the main address disappeared from the index. As a result, only those items remained sub-pages, but the main page of the index.php in the address flew, because it did not backlinks.

The proposal of this so it is better to immediately protect themselves against possible wrong choice, instead of later versions to worry about how long it will take back to the old positions. Returning to the case on my website – search results have already been locked, but a notice had the option to sort the different values. They have been added in the tools for webmasters to your ignore list of parameters, but this is not sufficient in this case. Fortunately, I did not notice anything in serpach what I should be worried, so I’m done and I shall write a patch ;-)
Errors resulting in DC

Reproduction of content at different addresses in the same domain usually occurs in the following cases:

  • Availability of the homepage such as multiple addresses: domain.com, www.domain.com, domain.com / index.php, domain.com / home.php, etc. – many options here;
  • Indexing URLs with session IDs – Google is in effect identical versions of the same party;
  • Indexing results sorting – the result may be identical or similar content at different addresses, including differing sequence of certain information;
  • Indexing search results.

Depending on with which case we have to do, the solution of the problem is:

  • A 301 redirect to the target version;
  • Use the rel = “canonical”;
  • Lack of indexing of selected addresses.

How to block indexing of selected sub-pages?

The best way to protect themselves against unwanted indexing of addresses is a combination of the following methods:

  • Nofollow in links pointing to this address – thanks to this robot should not go after these links. This method is not sufficient, because Google can find links to other sites and will result in the sub;
  • Lock in your robots.txt – it prohibits the entry robots for selected sub, but in a situation where they have already been indexed, it will only display the result in a form other than the standard, which is the same address as the place Title;
  • Noindex – only in this way you can get rid of sub-sites from search results.

To speed up, you have the option URL removal tool for the webmaster.

As soon as I’m going to write about Duplicate content across multiple domains and I have observed problems associated with this issue. I do not know if I can do this in a week or until next weekend ;-)