Waging My War Against Blog Content Scrapers

Table of Contents

My blog wasn’t even cleaned, swaddled and nursed before someone stole it out from under me.

There it was – my first organic backlink!

I eagerly clicked on the WordPress pingback notification. What lucky reader had discovered my genius and was eagerly drinking up my wisdom and directing others to do the same?

It was a stolen reproduction of my entire article: text, icons, images, affiliate links, everything. Lock, stock n’ barrel.

Spam Backlink

At the bottom of the page was just a simple sentence stating, “This article originally appeared in AskTheRVEngineer.” The webmaster had even removed the blue color and underline of the link.

Dirty schmuck.

Actually, I was lucky. They could have scraped my content and never posted a backlink. It might have been months before I discovered the rip off.

“They don’t know who they’re messing with!” I ranted to my wife. “They are dead! So dead! How dare they? How DARE THEY STEAL MY STUFF!

“I WILL BURN THEM TO THE GROUND!”

Now, some distribution of copyrighted material is allowed under the Fair Use doctrine. Copy-and-pasting, even with attribution, doesn’t fall under the Fair Use doctrine.

In fact, blog writing copyright is covered by the Digital Millennium Copyright Act.

What I found out is that I can complain to the domain name registrar and submit a notice. So long as I have the offending URL, proof that I’m the original publisher, and a few other smaller things, I can submit a DMCA take-down report.

So that’s what I did.

How to Submit a DMCA Take Down Notice

I found the domain name registrar through a WHOIS lookup. And bam! There it was!

Abuse Registrar Email

You can either send a DMCA form to the registrar abuse email, or you can submit a DMCA takedown through the registrar’s support center. Here’s the one I used at NameCheap.

But I wasn’t done. I was like Tony Mantana in Scarface.

How to Submit a Copyright Infringement Report Through Google 

Turns out, you can also submit a copyright infringement complaint to Google through your Search Console.

Did that, too.

How to Contact a Webmaster About Scraped Content

Lastly, I added the cherry on top.

I wrote the webmaster an email.*

*Yes, you can copy and paste the “private” email address shown in ICANN lookup. Your emails will still get delivered to a real address.

 

Webmaster,

yourcrummywebsite.com is re-publishing original content from askthervengineer.com.

Cease immediately and delete all our original content.

We have submitted two formal DMCA reports with your registrar, NameCheap.

We have also submitted spam reports via Google Search console.

Respond with verification of content deletion.

Regards,

A day later, I received an email response:

You can also just send me an email before and I would remove your contents…

“What a novel idea!,” I thought to myself. “Why didn’t I think of that beforehand?

Oh, yeah, that’s right. Because your home page doesn’t have an About or Contact page, your blog is a hidden subdomain, there’s a grand total of zero phone numbers, emails or addresses on your spammy lead generation site, and because your entire blog website is made up of scraped content used without permission from the owner!

Admission of the crime isn’t acquittal.

If someone smashes my front door, steals my vintage Nintendo 64, takes it back to their house and posts a sign on their front door saying, “I stole my Nintendo 64 from Ross’s house,” that doesn’t make it ok!

But good news … the webmaster did remove my content.

And turns out, that’s often the most effective way to stop scraping. Spammers work hard to create these websites. They don’t want to wind up with a Google penalty. Threaten them with a DMCA take down notice, or just ask them to remove your content, and they’ll likely do it.

Victory!

How to Turn Off WordPress RSS Feed

Then I started to wonder, “How did they steal my stuff?”

Turns out, a lot of scraper sites steal your content using RSS feeds.

Now, I didn’t even know what an RSS feed was. (I’m a Millennial, after all.)

Basically, it’s an XML file that allows a user to read a website’s content without actually accessing the website. Sort of a holdover from the 1990s.

As a blogger focused on monetizing my site, that sounds completely backwards.

  • We spend hours designing a website and interlinking pages to keep visitors on our sites.
  • We build email campaigns and lead magnets to entice email subscribers.
  • We design click-through sales funnels to convert leads into customers.

… And the RSS feed just bypasses all that!

In fact, some spammy curation sites have argued that just enabling an RSS feed of your work is an implicit license to redistribute!

WordPress, the Mother Hen of all bloggers, spits out a full-text RSS feed by default.

You can view it at:

 http://[yoursitedomainname].com/?feed=rss

….. What?!!!

I don’t want RSS readers to just copy and paste my content all over the web!

So I asked the Big Question: Does anyone even use RSS anymore?

Google unearthed every answer under the sun: yes, no, what’s RSS, I read 200 articles a day via RSS, don’t you dare take away my RSS, email is better than RSS for conversions, etc.

Since no one was arguing that A) RSS feeds put more money in their pocket or B) users would freak out with RSS feed, I set out to disable it.

In my WordPess CPanel, I navigated to Settings >> Reading. You can choose a full text or truncated (summary) text for your RSS feed. I chose truncated.

Wordpress RSS Settings

The theory is that autoblogging software can’t derive a full web page from a summary RSS feed.

I congratulated myself for outsmarting the bots.

…. But then I found a software that promised content thieves it could recreate a full web page from truncated RSS.

“D***,” I thought. “Can I completely turn off WordPress RSS?”

With the right plug-in or custom code, yes, you can.

But turns out, there’s software that can steal your content even without an RSS feed, too.

  • Web crawlers can parse a page HTML for recreation.
  • Humans can right-click and copy-paste.
  • Bots can create image snippets of text-based posts.

In fact, the more I read, the more I realized I was fighting a losing battle. There’s no fool-proof method to stop all copying. The bots are smarter than me. Any post I write can and will be used against me.

How to Discourage Blog Content Scrapers

So I moved onto my third and final strategy: If you can’t beat ‘em, annoy the heck out of them.

1) Interlink!

First of all, interlink interlink INTERLINK! If your content is constantly redirecting your readers to helpful articles on YOUR site, then you stand a good chance of redirecting traffic from the scraper’s website, too!

Interlinking Example

2) Claim Copyright!

I edited my footer to read Copyright 2021 – My Company Name – All Rights Reserved. This is superficial protection. Just by being the original author, I’m automatically afforded copyright protection. But a little extra never hurt, right?

3) Get a DMCA Protection Badge

I signed up with a freemium DMCA Protection service. As I grow, I’ll probably invest in a monthly protection and monitoring plan.

If you have an SEO all-in-one tool like Moz and Ahrefs, you can manually check for spammy backlinks yourself.

Anyway, I now have a DMCA-protected badge for my website content. Looks like this.

DMCA Badge Notice Example

Is it perfect? Definitely not! But now if someone steals my content, I’m at least a thorn in their side.

Solution: Just Be a Little Faster than the Next Guy

Here are my basic thoughts.

Stopping the content skyscrapers is like surviving the zombie apocalypse: You just have to be a little bit faster than the guy next to you.

Content scraping and autoblogging is big business. I can’t stop it 100%.

But if I can beat ’em at their own game, they’ll probably move on to someone else. And if they still insist on stealing my stuff, at least I’ll get some backlinks out of it.

If you don't share this post, a baby hamster will die.