Jump to content

Block (ban, redirect) spiders (bots) by user agent


Recommended Posts

I want to block / redirect certain bots like Baidu. When searching how to do this with Prestashop, I saw some threads with suggestions to do it with the robots.txt file. My understanding is that many of these bots do not respect the robots.txt file, so this advice is useless.

 

I also understand bots can be blocked with the htaccess file. But I read this causes server load.

 

I have a forum where I implemented a modification where bots can be blocked and redirected by user agent. Meaning a unique part of the UA string is added to the block list, and any bot that has this in its UA gets redirected.

 

Is there a PS module with this functionality?

Edited by Dan1 (see edit history)
Link to comment
Share on other sites

Hi,

If you're avoiding both robots.txt and .htaccess solutions, instead of bloating your PS installation with another module just for this purpose, you can just hack your index.php file by adding:

if ($_SERVER['HTTP_USER_AGENT'] == 'baidu...') {
header('location:404.html');
die;
}

that case you're avoiding unnecessary PS load.

  • Like 1
Link to comment
Share on other sites

Thank you.

 

How would I change this code to redirect to an external site?

 

Why is there ... after baidu? Will this block all instances of baidu, like: baiduspider, botbaidu etc.?

 

How do I change the code you provided to add more bots?

Link to comment
Share on other sites

  • 2 weeks later...

It doesn't work in my PS 1.3.2.3.

I modified my index.php with the direction I found in my log:

 

if ($_SERVER['HTTP_USER_AGENT'] == 'http://www.baidu.com/search/spider.html') {
header('location:404.html');
die;
}

 

 

Any sintax mistake perhaps?

thanks

Link to comment
Share on other sites

bot's have got different user agents, different refferes urls, different ip number.. this is the main problem. The best way to block unwanted bots is block their IP addresses with $_SERVER['SERVER_ADDR'] variable

 

Thanks for the answer, I'm trying to block several IP addresses, however I don't understand how to do it.

Could you tell me the exact syntax for this bots that appears in my logs files, please?:

 

http://www.baidu.com/search/spider.htm
http://www.bing.com/bingbot.htm
http://ahrefs.com/robot/

 

All of them are creating phantom carts using differents IP addresses.

Thanks!

Edited by c.carlos.s (see edit history)
Link to comment
Share on other sites

Well, I keep trying modifying my index.php file with different words. I realize I have two different index.php files:

First one is in www.myshop/index.php with this code:

<?php

header("Expires: Mon, 26 Jul 1997 05:00:00 GMT");
header("Last-Modified: ".gmdate("D, d M Y H:i:s")." GMT");

header("Cache-Control: no-store, no-cache, must-revalidate");
header("Cache-Control: post-check=0, pre-check=0", false);
header("Pragma: no-cache");

header("Location: ../");
exit;
?>

 

Second one is in www.myshop/themes/mytheme/index.php as:

<?php

include(dirname(__FILE__).'/config/config.inc.php');

if(intval(Configuration::get('PS_REWRITING_SETTINGS')) === 1)
$rewrited_url = __PS_BASE_URI__;

include(dirname(__FILE__).'/header.php');

$smarty->assign('HOOK_HOME', Module::hookExec('home'));
$smarty->display(_PS_THEME_DIR_.'index.tpl');

include(dirname(__FILE__).'/footer.php');

?>

 

Do you know which one must I modify with:

if ($_SERVER['HTTP_USER_AGENT'] == 'different expressions containing BAIDU, AHREFS, BING...') {
header('location:404.html');
die;
}

 

Thanks a lot!

Link to comment
Share on other sites

×
×
  • Create New...