What is a duplicate?
Duplicate, also known as "Duplicate Content" or "duplicate content" occurs when identical or almost identical content appears under different URLs on the Internet. This can happen both deliberately and accidentally.
Search engines such as Google recognize these duplicates and often have difficulty selecting the relevant, original page and prioritizing it in the search results. This can not only lead to a poorer Ranking but also significantly impair the effectiveness of SEO measures.
Typologies and examples of duplicate content
Duplicate content can occur in various forms. It can occur either within a single domain (internal duplicate content) or across multiple domains (external duplicate content). Examples:
- A print version and an HTML version of the same article.
- Pages with and without trailing slash, such as
example.com/page
andexample.com/page/
. - HTTPS and HTTP versions of the same website.
Identifying and eliminating such duplicates is essential for an effective SEO strategy.
Negative effects of duplicate content
Duplicate content can have several undesirable consequences. One of the biggest challenges is that search engines cannot recognize the Ranking-potential, also "Link Juice" to several pages instead of giving it to a central page. This way, none of the duplicate pages will get the full SEO benefit.
In addition, the number of crawled pages that have to be processed by search engine bots can be unnecessarily increased. This can affect the indexing and findability of important web pages. It is therefore crucial to identify duplicate content and take appropriate measures to clearly distinguish between original and duplicate content.
Detection and challenges of duplicate URLs
Identifying duplicate URLs is crucial to optimizing the SEO performance of a website. Duplicate content is often caused by various factors such as different URL-Parameter, Druckversionen von Seiten oder die Erreichbarkeit von Seiten sowohl unter HTTP als auch HTTPS. Ohne eine kanonische Markierung betrachtet Google diese Seiten als gleichwertige Duplikate, was das Problem schafft, dass keine eindeutige Priorität zwischen den Seiten festgelegt wird..
Tools and techniques for identification
An effective way of identifying duplicate content is to use the Google Search Console. Under the menu item "Index" and then "Pages", URLs can be checked for successful and incorrect indexing. The index coverage report helps to identify affected URLs and their errors. Other helpful tools are specialized SEO analysis tools that can generate automated reports on duplicate content.
Challenges in eliminating
Das Entfernen doppelter URLs erfordert präzise und konsistente Maßnahmen. Eine der Herausforderungen besteht darin, die richtige kanonische URL festzulegen. Zu den empfohlenen Methoden gehören das Setzen eines rel=canonical-Tags im HTML-Code oder das Versenden eines entsprechenden HTTP-Headers in der Seitenantwort. Für größere Websites ist es zudem sinnvoll, die kanonischen Seiten in einer Sitemap to be stored. Another tried and tested method is the implementation of 301-redirects, especially if a duplicated page is no longer actively used or is to be deactivated.
However, it is important to note that the file robots.txt nicht für die Kanonisierung verwendet wird und innerhalb der Website konsistente Verlinkungen zur kanonischen URL should take place. Furthermore, the implementation of hreflang-Tags a canonical page must always be specified, and Google always prefers HTTPS pages opposite HTTP pages.
Methods for defining a canonical URL
The determination of a canonical URL ist ein entscheidender Schritt, um doppelten Inhalt zu vermeiden und den Suchmaschinen eine klare Priorisierung zu ermöglichen. Es gibt verschiedene Methoden, um eine kanonische URL each with their specific areas of application and advantages.
Rel=canonical tag
The rel=canonical-tag is probably the most common method of labeling a canonical URL. Dieses Tag wird direkt im-Bereich des HTML-Codes einer Seite eingebettet und weist Suchmaschinen darauf hin, welche Version einer Seite als die Original- oder Hauptversion angesehen werden soll. Diese Methode eignet sich am besten für HTML-Seiten und ist relativ einfach zu implementieren.
HTTP header rel=canonical
An alternative method is to send a "rel=canonical" header in the HTTP response of the page. This method is particularly useful to prevent the enlargement of pages by different versions. It can also be used to effectively canonicalize content that is not in HTML.
Sitemap and 301 redirection
For larger websites, it is advisable to save the canonical pages directly in a Sitemap must be specified. This makes it easier for search engines to index correctly. Another common practice is the use of 301 redirects. This forwarding method is particularly useful if a duplicated page is no longer in use or is to be permanently removed.
Es ist entscheidend sicherzustellen, dass keine verschiedenen kanonischen URLs für dieselbe Seite angegeben werden und innerhalb der Website konsistente Verlinkungen zur kanonischen URL must be carried out. It should also be noted that when using hreflang-Tags a canonical page is always specified and HTTPS pages opposite HTTP pages are preferred.
Common crawling errors and their causes
Crawling-Errors occur when search engine bots have difficulty indexing a page or specific content. These errors can occur for various reasons and must be regularly monitored and corrected to ensure the Search engine optimization and indexing of the page.
Server errors and URL blocking
Server error (5xx) are caused by problems such as server overload, incorrect server configuration or server failures. In these cases, the bot is unable to access the page. Continuous monitoring and immediate resolution of these problems is essential. Another common error is when URLs are blocked by the robots.txt-file are blocked. In this case, a rule in the robots.txt-file to access the URL. Die Lösung besteht darin, die betreffende Blockierregel zu entfernen und somit den Zugang freizugeben.
Errors such as "Sent URL is as noindex characterized" occur when a noindex-meta tag is present. To rectify this error, the noindex-meta tag can be removed if the URL indexed should be displayed. Soft 404 errors are also common. These occur when a page does not exist but still returns a 200 status code. An HTTP response code 404 or 410 should be used here instead.
Pages not found and indexing problems
A "404 not found“-Fehler bedeutet, dass die URL nicht existiert. Um dies zu beheben, sollten die eingehenden Links überprüft und gegebenenfalls umgeleitet werden. Ein „403 Prohibition of access" error indicates that the search engine bot is missing the necessary login data. Here, access for the crawler should be enabled without restriction.
Errors such as "Crawled - currently not indexed" are caused by less relevant content for users, unrated pages or duplicate content. To solve these problems, duplicate content should be checked, internal links optimized and more helpful content added. Content are created. Similar measures apply to pages with the status "Found - currently not indexed“, wo die URL has been found but not yet crawled.
Other specific errors
A common error is "Alternative page with correct canonical tag", where the Canonical tag refers to the main version of the content. The error "Duplicate - not defined as canonical by the user" occurs if there are no canonical tags on the main version. Appropriate tags should be set here. There are also cases in which Google determines another page as canonical, as with "Duplicate - Google has designated a different page than the user as the canonical page". Redirects should be checked here and adjusted manually if necessary.
Measures to avoid duplicates in WordPress
Avoiding duplicates in WordPress is an important part of SEO management and requires a targeted approach. One of the most effective methods for identifying and eliminating duplicate content is to use the Google Search Console or special SEO tools that duplicate Content and make appropriate recommendations.
Identification and canonization
The first step is to identify duplicates and determine the main page. Once the main page has been defined, canonical tags can be set. This is done in WordPress using either a SEO plugins or manually.
With an SEOPlugin: Edit the page or post in question in the WordPress administration, go to SEO-Plugin-area and scroll to the canonical URL and save.
Without Plugin: Switch to text mode in the WordPress editor and <link rel="canonical" href="URL_YOUR_MAIN_PAGE" />
in the area and refresh the page.
Structure and internal linking
A clear and unambiguous page structure also helps to avoid duplicate content. Care should be taken to ensure that each page has a unique and clear purpose. The internal linking should be optimized in such a way that it is consistently based on the canonical URL verweist. Zudem ist es ratsam, regelmäßig SEO-Audits durchzuführen, um sicherzustellen, dass alle kanonischen Tags richtig gesetzt sind und keine neuen Duplikate entstanden sind.
One preventative measure is to avoid duplicates from the outset by ensuring that no superfluous pages or posts are created. If necessary, external help can also be sought to ensure that the measures presented are implemented correctly and that all potential problems are identified and rectified.
« Back to Glossary Index