What is a duplicate?
Duplicate, also known as "Duplicate Content" or "duplicate content" occurs when identical or almost identical content appears under different URLs on the Internet. This can happen both deliberately and accidentally.
Search engines such as Google recognize these duplicates and often have difficulty selecting the relevant, original page and prioritizing it in the search results. This can not only lead to a poorer Ranking but also significantly impair the effectiveness of SEO measures.
Typologies and examples of duplicate content
Duplicate content can occur in various forms. It can occur either within a single domain (internal duplicate content) or across multiple domains (external duplicate content). Examples:
- A print version and an HTML version of the same article.
- Pages with and without trailing slash, such as
example.com/page
andexample.com/page/
. - HTTPS and HTTP versions of the same website.
Identifying and eliminating such duplicates is essential for an effective SEO strategy.
Negative effects of duplicate content
Duplicate content can have several undesirable consequences. One of the biggest challenges is that search engines cannot recognize the Ranking-potential, also "Link Juice" to several pages instead of giving it to a central page. This way, none of the duplicate pages will get the full SEO benefit.
In addition, the number of crawled pages that have to be processed by search engine bots can be unnecessarily increased. This can affect the indexing and findability of important web pages. It is therefore crucial to identify duplicate content and take appropriate measures to clearly distinguish between original and duplicate content.
Detection and challenges of duplicate URLs
Identifying duplicate URLs is crucial to optimizing the SEO performance of a website. Duplicate content is often caused by various factors such as different URL-parameters, print versions of pages or the accessibility of pages under both HTTP and HTTPS. Without a canonical marker, Google considers these pages to be equivalent duplicates, which creates the problem that no clear priority is set between the pages.
Tools and techniques for identification
An effective way of identifying duplicate content is to use the Google Search Console. Under the menu item "Index" and then "Pages", URLs can be checked for successful and incorrect indexing. The index coverage report helps to identify affected URLs and their errors. Other helpful tools are specialized SEO analysis tools that can generate automated reports on duplicate content.
Challenges in eliminating
Removing duplicate URLs requires precise and consistent action. One of the challenges is to find the right canonical URL to be defined. The recommended methods include setting a rel=canonical tag in the HTML code or sending a corresponding HTTP header in the page response. For larger websites, it also makes sense to define the canonical pages in a Sitemap to be stored. Another tried and tested method is the implementation of 301-redirects, especially if a duplicated page is no longer actively used or is to be deactivated.
However, it is important to note that the file robots.txt is not used for canonicalization and has consistent links to the canonical canonical URL should take place. Furthermore, the implementation of hreflang tags a canonical page must always be specified, and Google always prefers HTTPS pages opposite HTTP pages.
Methods for defining a canonical URL
The determination of a canonical URL is a crucial step to avoid duplicate content and allow search engines to prioritize clearly. There are various methods to create a canonical URL each with their specific areas of application and advantages.
Rel=canonical tag
The rel=canonical-tag is probably the most common method of labeling a canonical URL. This tag is embedded directly in the HTML code of a page and indicates to search engines which version of a page should be regarded as the original or main version. This method is best suited for HTML pages and is relatively easy to implement.
HTTP header rel=canonical
An alternative method is to send a "rel=canonical" header in the HTTP response of the page. This method is particularly useful to prevent the enlargement of pages by different versions. It can also be used to effectively canonicalize content that is not in HTML.
Sitemap and 301 redirection
For larger websites, it is advisable to save the canonical pages directly in a Sitemap must be specified. This makes it easier for search engines to index correctly. Another common practice is the use of 301 redirects. This forwarding method is particularly useful if a duplicated page is no longer in use or is to be permanently removed.
It is crucial to ensure that no different canonical URLs are specified for the same page and that there are consistent links to the canonical URL within the website. URL must be carried out. It should also be noted that when using hreflang tags a canonical page is always specified and HTTPS pages opposite HTTP pages are preferred.
Common crawling errors and their causes
Crawling-Errors occur when search engine bots have difficulty indexing a page or specific content. These errors can occur for various reasons and must be regularly monitored and corrected to ensure the Search engine optimization and indexing of the page.
Server errors and URL blocking
Server error (5xx) are caused by problems such as server overload, incorrect server configuration or server failures. In these cases, the bot is unable to access the page. Continuous monitoring and immediate resolution of these problems is essential. Another common error is when URLs are blocked by the robots.txt-file are blocked. In this case, a rule in the robots.txt-file to access the URL. The solution is to remove the blocking rule in question and thus unblock access.
Errors such as "Sent URL is as noindex characterized" occur when a noindex-meta tag is present. To rectify this error, the noindex-meta tag can be removed if the URL indexed should be displayed. Soft 404 errors are also common. These occur when a page does not exist but still returns a 200 status code. An HTTP response code 404 or 410 should be used here instead.
Pages not found and indexing problems
A "404 not found" error means that the URL does not exist. To rectify this, the incoming links should be checked and redirected if necessary. A "403 Prohibition of access" error indicates that the search engine bot is missing the necessary login data. Here, access for the crawler should be enabled without restriction.
Errors such as "Crawled - currently not indexed" are caused by less relevant content for users, unrated pages or duplicate content. To solve these problems, duplicate content should be checked, internal links optimized and helpful content created. Similar measures apply to pages with the status "Found - currently not indexed", where the URL has been found but not yet crawled.
Other specific errors
A common error is "Alternative page with correct canonical tag", where the Canonical tag refers to the main version of the content. The error "Duplicate - not defined as canonical by the user" occurs if there are no canonical tags on the main version. Appropriate tags should be set here. There are also cases in which Google determines another page as canonical, as with "Duplicate - Google has designated a different page than the user as the canonical page". Redirects should be checked here and adjusted manually if necessary.
Measures to avoid duplicates in WordPress
Avoiding duplicates in WordPress is an important part of SEO management and requires a targeted approach. One of the most effective methods for identifying and eliminating duplicate content is to use the Google Search Console or special SEO tools that can recognize duplicate content and make appropriate recommendations.
Identification and canonization
The first step is to identify duplicates and determine the main page. Once the main page has been defined, canonical tags can be set. This is done in WordPress using either a SEO plugins or manually.
With an SEOPlugin: Edit the page or post in question in the WordPress administration, go to SEO-Plugin-area and scroll to the canonical URL and save.
Without Plugin: Switch to text mode in the WordPress editor and <link rel="canonical" href="URL_YOUR_MAIN_PAGE" />
in the area and refresh the page.
Structure and internal linking
A clear and unambiguous page structure also helps to avoid duplicate content. Care should be taken to ensure that each page has a unique and clear purpose. Internal linking should be optimized so that it consistently links to the canonical URL references. It is also advisable to carry out regular SEO audits to ensure that all canonical tags are set correctly and that no new duplicates have been created.
One preventative measure is to avoid duplicates from the outset by ensuring that no superfluous pages or posts are created. If necessary, external help can also be sought to ensure that the measures presented are implemented correctly and that all potential problems are identified and rectified.
« Back to Glossary Index