Duplicate Content Checker WordPress Plugins are a vital part of any bloggers tool kit.
Firstly, because WordPress automatically creates duplicate content and that duplicate content could be hurting your rankings.
Secondly, because other websites may be duplicating your content, and that also could damage your SEO. I recently discovered that my entire website had been ‘scraped’ and copied to another website. More about that, and how I dealt with it, below.
But first what exactly is duplicate content and why does it hurt your search engine rankings?
In a nutshell, duplicate content is content that is identical and can be accessed on two or more different URLs.
The duplication can occur:
- within your own website
- on another website: cross-domain duplication occurs when another website copies your content.
Let’s look at these two different occurrences of duplicate content:
Unfortunately, if your website runs on WordPress, it’s highly likely that that you have duplicate content.
But why does WordPress create duplicate content?
Well, strictly speaking, it doesn’t. In the WordPress database there is just one version of your article or blog post.
But WordPress allows that piece of content to be discoverable in numerous different ways, each with its own URL. And as far as the search engines are concerned, those different URLs represent duplicate content.
Let’s say you write an article about email marketing and you give it the permalink ‘email marketing’. Here are some of the ways that page can be accessed through different URLs:
- http:// yourdomain.com/email-marketing
- http:// yourdomain.com/tag/email-marketing
- http://www. yourdomain.com/tag/email-marketing
- http:// yourdomain.com/category/email-marketing
- http://www. yourdomain.com/Category/email-marketing
- https:// yourdomain.com/email-marketing
- https:// yourdomain.com/tag/email-marketing
- https://www. yourdomain.com/tag/email-marketing
- https:// yourdomain.com/category/email-marketing
- https://www. yourdomain.com/Category/email-marketing
Remember: this is the same piece of content. I’ve simply listed the different ways that piece of content can be potentially be found through organic search.
In addition to the above list of alternative URLs, Session ID and tracking parameters (UTMs), which are used on many ecommerce websites, also create unique URLs that can then turn up as duplicate content in the search results.
If you create lots of original content, sooner or later other people are going to want to copy that content and put it on their own website.
Maybe they’re lazy, maybe they don’t have your writing skills, or maybe they don’t even speak or write English.
Whatever, the case, it will happen to you sooner or later, especially as the domain authority of your website increases.
I recently discovered an instance where my entire website had been copied to another domain.
In this case the offender had purchased the .org version of my website and simply copied my website onto his domain, as shown in the screenshot below:
Duplicate content on other websites is often the result of ‘content scrapers’ – in other words, it’s been done by a program.
And this means that the internal links that you created are still in the stolen content.
And that in turn means that when this plagiarized content goes live, it creates backlinks to your website.
And that’s how I discover much of the content that has been duplicated from my website to another website: my backlink checker (MonitorBacklinks) sends me a notification that I have acquired a new backlink.
Of course, it’s a backlink I don’t want, and I quickly disavow that link.
But it allows me to keep track of who is stealing my content and report them to Google (more on that below).
So why does duplicate content hurt your search rankings?
There are two ways duplicate content can harm your rankings:
- The SEO for that piece of content is diluted because the different versions are competing with each other
- Eventually, Google could start penalizing your site in the search results
Simply put, the search engines are confused – they find numerous versions of the same content and they don’t know which version to show in the search results.
Search engines will not show results containing the same content for a given search query, so they are forced to choose between the various versions.
This means that different versions of the same content will appear randomly in different search results.
And that of course, dilutes the ranking for each of the versions.
But the problem doesn’t stop there. Different people will find the different versions of your content in the search results and they will link to whichever version they found.
This means that your link equity, which could have all gone to one URL, is being shared by all the different URLs containing that same content.
The link equity you’re getting for that piece of content is a fraction of what it could be.
The second problem is that eventually Google may start penalizing your website for duplicate content.
When searching for duplicates, it’s important to remember there are two kinds of duplicate content:
- Duplicate content on your site
- Duplicate content on other websites
Google Search Console
A good way to find duplicate content on your website is Google Search Console.
Login to your GSC and in the left panel, at the bottom, click on ‘Go to the old version’:
In the old version of GSC, go to Search Appearance and then HTML Improvements. If you have any duplicate content issues, you’ll see them listed here:
Just go to Site Liner and type in the URL of your website:
Site Liner will give you a Duplicate Content report that shows your duplicate content as a percentage of total content (so you can see the size of the problem) and a list of the offending URLs.
One way is to discover if your content has been duplicated on other websites is to perform a manual search.
Simply take a phrase consisting of seven or more words from your article or blog post and copy it into Google search, enclosed between parentheses.
However, if you have 100 or more blog posts, that becomes very time-consuming.
Here’s a quicker way of finding duplicate content. It’s not free, but the for the price of two lattes, it’s worth doing.
Go to Copyscape and purchase $10 credit (the minimum). Then simply type in the URL of your website.
This is how I discovered that my entire site had been copied to another domain:
Just as there are two kinds of duplicate content, there are two ways of dealing with it:
Remember how we talked about all the different URLs on your site that can show the same content?
Here are some of the forms it takes and how to deal with it:
The solution to duplicate content created by WordPress is the ‘Canonical URL’ – that’s the original URL and the one the search engines want to know about.
There’s now a tag that you can insert at the top of every page you create called the ‘rel=canonical’ tag or the ‘canonical link’. It tells the search engines that this is the preferred version of this particular page or content.
When there are numerous different URLs all containing the same content, this tag quickly tells the search engines which one is preferred and which one they should be showing in the search results.
By default, the Yoast SEO plugin automatically inserts the ‘rel=canonical’ tag into every page or post you create. So, simply installing the free version of the Yoast SEO will go a long way to preventing g problems with duplicate content on your website.
Much of the duplicate content on a WordPress site arises because of WordPress Categories and WordPress Tags.
WordPress Categories and Tags are the two principal ways of organizing content on a WordPress site. Categories group together posts or articles on the same topic while tags assign specific keywords to particular posts or articles.
Categories and Tags are very useful but they’re also a major cause of duplicate content?
Because WordPress assigns URLs to categories and tags. And those URLs will turn up in the search results as duplicate content.
But there’s easy fix for this: using the “noindex, follow” tag. This tag tells the search engine robots to follow the links in categories and tags but not to index them.
To implement the “noindex, follow” for Categories go to Yoast (free version) in your WordPress dashboard and click on ‘Search Appearance’ and then ‘Taxonomies’.
Toggle the ‘Show categories’ button to ‘No’:
Scroll down and do the same for Tags:
Yoast will now apply the “noindex, follow” tag to your categories and tags and they will stop appearing in the search results.
Every time you insert media (e.g. photos) into a blog post or a page, WordPress creates a new page specifically for that item of media.
The problem with attachment URLs is that ever since the Panda Update, Google has been penalizing thin content.
And that’s exactly what attachment URLs are. They show up in the search results as a page but there’s nothing there except the item of media.
So, while attachment URLs are not exactly duplicate content, they get generated every time you create new content and they can harm your SEO.
The easiest way to deal with attachment URLs is to use a tag that redirects attachment URLs to the parent article.
You can do this in one stroke by changing the settings in Yoast.
If you go to the ‘Media’ tab in Yoast, you’ll see the following explanation as to why it’s better to redirect attachment URLs to the parent post that they came from:
Just toggle the button to ‘Yes’, and all those ‘thin content’ attachment URLs will automatically re-direct to the blog post they belong to.
You should be aware, however, that in the past Yoast has released updates that accidentally turned this setting off.
If that happens, your attachment URLs suddenly start showing up in the Google search results as ‘thin content’ and you can get hit with a Google penalty.
A 301 Redirect simply redirects traffic from one URL to another.
For example, when I started blogging, I allowed WordPress to create the post’s slug for me.
This resulted in URLs that looked like this:
I’ve since learned that long URLs are not god for SEO and that all I need in the slug is the keyword.
So, I went through all my old blog posts and changed many of the slugs to a shorter version, such as:
But this can create duplicate content.
A 301 Redirect.
Here are two plugins for creating 301 redirects:
Redirection – I used to use this plugin for redirecting from the old version of a post to the new version.
Yoast SEO – with the premium version of Yoast, when you change the slug of a post or page Yoast displays a message saying it created a 301 redirect. Yoast also provides an option to undo the redirection
www versus non-www
This can be another source of duplicate content. When people link to your site, they may use either form of URL.
Google will often treat them as separate URLs, resulting in duplicate content.
The way to deal with is to tell Google which is your preferred URL format (with ‘www’ or without).
Go to your Google Search Account (old version) and click on the cog in the top right of the screen and choose which URL version is your preferred option:
Setting a preferred URL is also good for SEO.
Why is that?
Because if half your links are in the ‘www’ format and half are in the ‘non-www’ format, neither format is receiving the full amount of link equity that you would otherwise be getting from those backlinks.
If you find that another website has copied your content without your permission, there are three ways of dealing with it:
Report It to Their Web Host
As I mentioned above, I recently discovered that my entire website has been copied to another domain.
I immediately looked up the offending website in ICANN’s Who Is Lookup and found that the offending website is hosted with GoDaddy.
As it happens GoDaddy has a very strong policy on DMCA infringements. I immediately sent an email to GoDaddy’s Copyright Infringement department asking them to remove the stolen material.
Report It to Google
If someone has infringed your copyright by lifting content from your site and placing it on their site, you should report it to Google.
Go to Google’s legal page and fill out the online form.
Of course, all Google can do is remove the offending site from their search results (they cannot actually have the offending site taken down).
In 2009 Google introduced a new tag to deal with situations where your content appears verbatim on someone else’s website. This duplication could be with, or without, your permission.
You may have given someone permission to reproduce your article or they may have stolen it form your website. Either way, by using this tag your version is the one that will show in the search results.
By using the cross-domain canonical tag, Moz estimates that about 90% of the link juice, authority, and ranking signals will transfer from the duplicate content to your page.
The following WP plugins will check your website for duplicate content.
Some of these plugins address the problem of duplicate content on your website while others address the issue of duplicate content on other websites.
Yoast SEO is the plugin I’ve referred to throughout this article.
The Yoast SEO plugin lets you:
- Create a Canonical URL tag for every new blog post or article
- Remove Categories and Tags from Google search results
- Re-direct attachment URLs to the parent post they came from
In my opinion it’s the best plugin for quickly and easily eliminating duplicate content issues in WordPress.
Duplicate Content Cure is a fairly simple plugin that makes your WordPress site more SEO friendly by preventing the search engines from indexing archives, tags, and categories, which usually contain duplicate content.
The plugin does this by adding the ‘nofollow, noindex’ tag to these pages.
Dooplee Duplicate Content Checker monitors your last 10 blog posts for instances where scrapers or ‘auto blogs’ have copied your content to another website.
The plugin also contains a form for lodging a DMCA complaint and provides suggestions as to how to deal with the plagiarized or stolen content.
The plugin attempts to have the offending content taken down. Where that’s not possible, the plugin attempts to have the offending material removed from the Google and Bing search engines, so that it is not competing with the original version.
Delete Duplicate Posts simply searches and removes duplicate posts and their meta data. The plugin is aimed primarily at cleaning up space on your WordPress website rather than addressing SEO issues that arise from duplicate content.
Fix Duplicates is designed for sites that accept user-generated content. The plugin deals with the problem of users submitting the same post again and again.
This tool deletes duplicate content and creates a 301 redirect back to the original version, thus preserving the link equity of the removed content.
Plagiarism is designed primarily to check that your content doesn’t duplicate someone else’s content. It checks the content of the post you’re about to publish to make sure it doesn’t contain any plagiarism.
But this plugin can also be used to check for websites that have scraped your existing content.
Google Plus Authorship
One of the best ways to protect your online content is to establish with Google that you are the author of the content.
You can do this through Google Authorship, a system Google introduced in 2011. Although Google has dropped the Authorship rich snippets from the search results, Google Authorship can still be used to identify to Google that you are the author of a piece of content.
The Google Plus Authorship plugin does this by linking your article to your Google + account.
Duplicate content comes in two forms: duplicate content generated by the WordPress platform and duplicate content that occurs when someone scrapes your content and places it on another website.
Either form of duplicate content can seriously damage your SEO performance so it’s important to take steps to eliminate both.
Follow the tips in this article and you won’t have to worry anymore about duplicate content.