Sarak Posted May 17, 2013 Share Posted May 17, 2013 (edited) Recently I've noticed a site/crawler called "webextract.net" online at my store - http://www.totalfancydress.com After some research, I found that it's a site scraper that crawls your website and copies designs, product info & images for the purposes of duplicating and/or selling it. In my robots.txt file, the secure pages aren't allowed to be accessed by anyone. But how do I prevent "Scraping" websites from accessing my site at all without blocking every other agent, such as Google? Edited May 17, 2013 by Sarak (see edit history) Link to comment Share on other sites More sharing options...
vekia Posted May 17, 2013 Share Posted May 17, 2013 You can block IP address in your htaccess or by using free module: block ip address free 1 Link to comment Share on other sites More sharing options...
Nikolai Posted June 11, 2015 Share Posted June 11, 2015 It is not possible to block web scraping if a man from another side really want to scrape your site. He will always find way how to do it: - long delays between requests - using proxies - anti-captcha. Link to comment Share on other sites More sharing options...
johnwolf Posted June 11, 2015 Share Posted June 11, 2015 Contact your hosting provider to block such IPs through which scrapers are coming. Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now