Blocking Web Scraping

Sarak · May 17, 2013

Recently I've noticed a site/crawler called "webextract.net" online at my store - http://www.totalfancydress.com

After some research, I found that it's a site scraper that crawls your website and copies designs, product info & images for the purposes of duplicating and/or selling it.

In my robots.txt file, the secure pages aren't allowed to be accessed by anyone. But how do I prevent "Scraping" websites from accessing my site at all without blocking every other agent, such as Google?

Edited May 17, 2013 by Sarak (see edit history)

vekia · May 17, 2013

You can block IP address in your htaccess or by using free module: block ip address free

Nikolai · June 11, 2015

It is not possible to block web scraping if a man from another side really want to scrape your site.

He will always find way how to do it:

- long delays between requests

- using proxies

- anti-captcha.

johnwolf · June 11, 2015

Contact your hosting provider to block such IPs through which scrapers are coming.

Sign In

Blocking Web Scraping

Recommended Posts

Sarak

Link to comment

Share on other sites

vekia

Link to comment

Share on other sites

Nikolai

Link to comment

Share on other sites

johnwolf

Link to comment

Share on other sites

Create an account or sign in to comment

Create an account

Sign in

Browse

Activity

Go back to prestashop.com