How CMOs Can Help Prevent and Fix Duplicate Content Issues
Duplicate content issues can be a nuisance and a problem for websites, find out how to identify, fix and prevent duplicate content issues to keep your site’s performance at it’s best.
If you’re the owner of a mom-and-pop business, you most likely focus your company’s online presence on your website and a few social media pages. But as the chief marketing officer of a big brand, you are probably on a number of web platforms. While you like the idea of this extra exposure, you don’t want Google to ding you for duplicate content — which can hurt your ranking and SEO strategies. Fortunately, there are a number of tips you can use to fix existing duplicate content issues and prevent them from happening again.
Types of Duplicate Content Issues
Duplicate content comes in many forms and not always as you would expect, so it can be helpful to identify sources and types of duplicate content issues.
A prime example comes from the popular content management system (CMS) WordPress. When you create a new post, the platform automatically creates several “instances” of that post, including one under the category, tag, archive, and author, as well as the original post URL. So while you may think that WordPress is creating a single post, it’s actually creating nearly half a dozen duplicate posts. For example, on a poorly optimized WordPress blog, an article can be accessed by:
- http://domain23.com/SEO-post (External website)
We’ve mentioned the YOAST SEO plugin many times on our blog and here again installing this simple plugin is the easiest way to remedy these types of duplicate content issues.
The WordPress SEO by Yoast plugin allows you to automatically add those “follow, no-index” meta tags to all your archive pages. See the “SEO” > “Titles & Metas” settings page under the “Post Types” and “Taxonomies” tabs helping immensely with canonicalization. You can also check out this quick video to help you set up Yoast correctly to avoid duplicate content on your site.
The good news is that search engines typically won’t penalize a website for small amounts of duplicate content.
Like I mentioned previously, it’s estimated that roughly 30% of the entire Internet consists of duplicate content. If every website was penalized for it, the search results would look entirely different (and not in a good way!).
Video: Matt Cutts on Duplicate Content
Click the play button above to see Google’s former Webspam team leader answer the question: how does Google handle duplicate content?
Cutts reveals Google’s method for indexing pages with duplicate content, saying it’s usually the first page that publishes the content to get ranked. Any websites or pages afterwards will generally now show up in Google’s search results.
It is worth pointing out that there have been many studies run by reputable online marketers such as ClickZ, SearchEngineJournal and MarketingLand that have proven higher authority sites which can include scrapers sites can outrank your using your own content.
A prime example of this would be a small business site reposting content on a site like Medium. This happens pretty frequently when website owners try to grow their audience by reaching new viewers through other online sources.
Cutts does go on to say, however, that duplicate content can result in a rankings penalty. Instances such as this usually involve automated programs or software that scraps content from other sources and publishes it without the author’s consent. Known as “scraper sites” (for obvious reasons), they tend to contain nothing more than duplicate content that’s been rehashed across multiple websites.
This kind of content duplication can be checked and monitored by using a 3rd party service such as Copyscape.
CopyScape offers a free URL search, with results coming in in just a few seconds. While the free version doesn’t do deep searches (breaking down the text in order to search for partial duplication) it does a thorough job of finding exact matches. If you have found two URLs or text blocks that appear similar, Copyscape has a free comparison tool that will highlight duplicate content in the text. While there is a limited number of searches per site with their free service, CopyScape’s Premium (paid) account allows you to have unlimited searches, deep searches, search text excerpts, search full sites, and monthly monitoring of plagiarism.
Aggregation VS. Scraping Content
WordPress has many seo benefits including plugins like WP RSS Aggregator, to help aggregate and feed relevant content to your own audience. This can be very beneficial in helping small blog build content to help slake their viewers appetite for more; however there is a fine line between aggregating content and scraping content (duplicating in whole) for the sake of rankings.
Plugins like WP RSS aggregator make it very easy to cross this line, as I found out for myself while testing. This plugin allows you to pull in full articles from any RSS feed supposedly with the appropriate attribution to the original author. Don’t be fooled. Reposting an article in full even with attribution is copying their content without approval and can be grounds for search removal action or even penalty. Biznology wrote a great in-depth article about this very thing titled, “Scraping vs. Aggregation: How to share other’s content.”
How does duplicate content affect your search rankings?
Whether you realize it or not, chances are your website has at least some duplicate content. Perhaps it’s boilerplate/template elements that are used throughout your site, or maybe it’s excerpts taken from other sites. But how exactly does duplicate content affect a website’s search rankings?
Currently duplicate content is more of nuisance than something that is going to get your site penalized by Google.
It’s almost impossible to avoid duplicate content, in fact;
duplicate content is among the top 5 SEO issues that sites face especially now Google has put its Panda Update into play.
Google has long stressed the importance of publishing unique, high-quality content, and for good reason:
content is the driving force behind every successful website or blog. It gives people a reason to visit the website, and it helps search engines identify what the website is about. However, websites that consist primarily of duplicate content may experience difficulties in achieving a top search ranking for their target keywords.
Specifically, Google’s duplicate content policy states:
In the rare cases in which Google perceives that duplicate content may be shown with intent to manipulate our rankings and deceive our users, we’ll also make appropriate adjustments in the indexing and ranking of the sites involved. As a result, the ranking of the site may suffer, or the site might be removed entirely from the Google index, in which case it will no longer appear in search results.
A prime example of duplicate content affecting a specific page on your site negatively would be if you recently published a blog post and then republished that same post on multiple different sites. While this practice won’t get you in trouble what it could do is weaken the authority for the page on your site and possibly prioritize your same post featured on another site — ranking it higher in search results.
When Google finds identical content instances, it decides to show one of them. Its choice of the resource to display in the search results will depend upon the search query.
Should You Fear Duplicate or Scraped Content?
You don’t need to fear scraped content — simply learn how to deal with duplicate content issues?
Again, you absolutely do NOT need to fear duplicate content.
You will definitely want to read Andy Crestodina’s article on blog.kissmetrics.com for a more detailed explanation of this to make you feel better, but:
Scrapers don’t help or hurt you. Do you think that a little blog in Asia with no original writing and no visitors confuses Google? No. It just isn’t relevant.
Personally, I don’t mind scrapers one bit. They usually take the article verbatim, links and all. The fact that they take the links is a good reason to pay attention to internal linking. The links on the scraped version pass little or no authority, but you may get the occasional referral visit.
The bottom line is that you shouldn’t worry too much about the negative effects of duplicate content. Unless you are copying content from a site (without permission); which Google frowns upon highly or you offer nothing but duplicate content, then your site will be at risk especially now that the Panda Update is in dialed in to deal with this type of duplication.
Ultimately if you partake in this practice, Google may not show your pages in the search results at all or will simply throw your web site off the first few pages of the results.
What Steps Should You Take to Prevent Duplicate Content Issues?
These simple steps will help you prevent and fix duplicate content issues
1. Focus on optimizing your product pages
If you’re running an eCommerce site, it’s imperative to focus on your product pages in order to appear Google Search results. To ensure you are getting the best keyword optimization for product pages, Content Analytics Inc. suggests taking advantage of technology to make sure you are implementing best practices for product page optimization. This optimization must happen simultaneously with your own website pages. Then monitor your digital presence, especially with your most popular products, making sure that the content listed on each site is unique.
2. Minimize similar content
When it comes to finding tips to minimize duplicate content issues, Google itself is an outstanding source. Google offers a number of steps that CMOs can use to proactively address the issues with duplicate content. One way that Google suggests reducing the risk of duplicate content is to minimize any similar content that is found on web platforms. For example, if you have a number of pages with similar information, think about either expanding each page or combining them into one mega page. In the case of a clothing brand with separate pages for each type of garment, consider merging all of the pages devoted to trousers and pants onto one or flesh out each one with more content to make each unique.
3. Eliminate technical reasons for duplicate content
There are also plenty of technical reasons why duplicate content can take place — in these instances, it has more to do with the URL instead of the words written for the pages. For example, URLs like yourcompany.com, www.yourcompany.com/ and https://yourcompany.com may look very similar and may point to the same target URL, but the search engine bots read them as different URLs. So when these same bots come across the same content on what they think are two different URLs, they will consider it to be duplicate content. This same problem happens with URLs that you create for tracking purposes — the tracking bots will think they are different URLs and can zap you for duplicate content. In these cases, pick one main URL and stick with it, and if necessary, redirect similar URLs back to the one that you want; this should help eliminate the issue. For a deep dive into canonicalization to eliminate duplicate content and url issues, check out this post.
4. Duplicate content is common and fixable
You work hard as a CMO — the last thing you need is for Google to penalize you for duplicate content that you didn’t intend to send out. By optimizing your product pages and making sure they contain the most effective keywords, making tangible changes to your pages to make them as dissimilar and possible, and making sure your URLs are not the main problem, you can avoid being the wrath of Google’s algorithms while having a strong online presence.
The threat of Google penalty makes any website owner sit up a little straighter, it’s important to realize that duplicated content is more often than not free from malicious intent rather than blatantly copied for malicious gains. Copied content can often be penalized algorithmically or even manually. Duplicate content is not penalized, but that doesn’t mean it’s optimal for your website or rankings either. It always pays to be unique!