Content Scrapers: How To Handle Them? Can They Hurt Me?
Content scrapers are blogs or websites that use RSS feeds to pull all or part of their content. Now, knowing that the search engines prize unique, original content above all else, this might sound like a bad idea, especially if it’s your blog these scrapers are pulling their content from. But content scrapers do have a valid function in the grand scheme of things and they generally don’t hurt your rankings and your WordPress SEO as a whole.
What is Content Scraping?
You’ve probably seen more than one blog that uses scraping for content. You’ll see a headline and a paragraph or two on XYZ blog and when you click the “Read More” link you’re on a different blog. That original blog is scraping content, pulling it in from other blogs, most likely by using their RSS feed.
Why use Scraped Content?
Lots of blogs use scraped content. In fact, if you’re blogging on WordPress there’s already a widget to pull in RSS feeds, so you can set it up in a matter of seconds. But why would you want to?
Well, you can’t be an expert on everything and even if you were you wouldn’t have time to blog about everything. So lots of bloggers use scraping as a way to pull in content that their readers want from other sources. For example, if you blog about politics you might pull in the headlines from Google News.
Some blogs are nothing but scraped content, which would seem kind of shady, but when done properly it’s a valuable service for readers. You have everything they’re looking for, all in one place, in an online directory. Other bloggers who publish across multiple platforms – like their blog, Squidoo, Hubpages, and guest blogging – use scraping as a way to curate all of their content in one location to make it easier for their followers to find everything.
In a way, you’re scraping content with your own RSS feed reader. You’re pulling all of the content from your favorite blogs into one location – you’re curating content. Set up a page on your blog and publish all those feeds there and you’d be a content scraper, too. And don’t laugh – I’ve done it with my own blogs. I’m working on my blog all day anyway, it’s easier having my feed reader right there, and my readers like the additional resource.
Does it Hurt Your Rankings when Someone Scrapes Your Content?
In most cases, reputable bloggers will only pull your title, or they’ll pull the title and an excerpt, and then they’ll link directly to your article. In this case, you don’t have to worry about it affecting your rankings because they’re not publishing the complete article. In fact, it’s actually beneficial because it’s a link back to your blog, and if you’re really lucky that other blogger is using dofollow links.
As long as content scrapers link back to the original article and attribute their source they’re perfectly legal in the eyes of the search engines, even if they publish your complete article. And this doesn’t violate any duplicate content rules, either. Duplicate content only refers to content on your own blog.
Google gives higher points to the original poster, however, it is possible that over time the scraper site could end up ranking higher than your blog, especially if they have a higher page rank and authority and your article on their blog generates a lot of natural backlinks. First, remember, the reader is eventually going to have to click through to your blog to read the whole article, so relax and enjoy the extra traffic.
Second, you can counteract this to a certain extent by claiming authorship, something you should do with all of your online content. You’ll need a Google account so you can link to all of your online profiles and claim ownership of your content.
What about those unscrupulous bloggers who scrape your entire article and remove your name and link? If possible, contact the blog owner and tell them to either publish only an excerpt and the proper link or remove your content altogether. Give them a maximum of 24 hours to respond and, if they don’t, file a DMCA with Google and they’ll take care of it.