mowax Posted October 7, 2011 Share Posted October 7, 2011 Hi, So I have quite a few crawl errors showing up in my webmaster tools. I have 8 HTTP errors which I have pasted below. I guess it is something to do with language packs but I don't have any installed. URL Detail Detected http://www.dura-tex.co.uk/-anti-fatigue-mats/41-coin-grip-rubber-matting-6mm.html&id_lang=2 Domain name not found Sep 9, 2011 http://www.dura-tex.co.uk/-carpets-and-rugs/28-red-carpet-runner.html&id_lang=2 Domain name not found Sep 9, 2011 http://www.dura-tex.co.uk/-anti-fatigue-mats/39-interlocking-rubber-mat.html&id_lang=2 Domain name not found Sep 8, 2011 http://www.dura-tex.co.uk/64-double-sided-reinforced-tape.html&id_lang=1 Domain name not found Sep 5, 2011 http://www.dura-tex.co.uk/63-heavy-duty-adhesive-spray.html&id_lang=1 Domain name not found Sep 5, 2011 http://www.dura-tex.co.uk/53-luxury-soft-pile-carpet-tiles.html&id_lang=1 Domain name not found Sep 5, 2011 http://www.dura-tex.co.uk/52-loop-pile-carpet-tiles.html&id_lang=1 Domain name not found Sep 5, 2011 http://www.dura-tex.co.uk/48-swimming-pool-matting-swimming-pool-flooring.html&id_lang=1 Domain name not found Sep 5, 2011 I also have 29 Not found errors, a typical example is this: http://www.dura-tex....jM=Ub4UnkNbVuM= 404 (Not found) I don't really know what to make of that, its just gibberish! Can somebody please lend me a hand with this? I'm guessing website architecture is pretty crucial to good rankings. Thanks for any replies Link to comment Share on other sites More sharing options...
CraigMeade Posted October 7, 2011 Share Posted October 7, 2011 I can't open a good number of those pages either. It looks like at least some of these links are URL forwarded to a new link - that is the product URL minus the domain URL. EG http://www.64-double-sided-reinforced-tape.html/ Or in some cases forwarding to the product number only. EG http://www.41-.html/ So I would say this is an issue or set-up error that is based in your hosting, rather than Prestashop. Quite a complex one though. Talk to your host is my advice. I don't see any way that P'shop could be doing this. Best of luck and do come back to here and tell us what the problem was when you get it sorted. Link to comment Share on other sites More sharing options...
mowax Posted October 7, 2011 Author Share Posted October 7, 2011 Thanks for your reply Craig, I will check with bluehost.com and see what they can do. But I don't understand why there are &id_lang=2 variables in there, as I only have the english language pack installed, which is &id_lang=1? How can they be crawling pages that don't (or shouldn't) exist? Link to comment Share on other sites More sharing options...
CraigMeade Posted October 7, 2011 Share Posted October 7, 2011 I'm stumped, and breaking out in a cold sweat imagining this happening to me. I feel your pain Duratex. Link to comment Share on other sites More sharing options...
dazzza Posted October 8, 2011 Share Posted October 8, 2011 Your robots.txt is only showing a link to your sitemap. Try regenerating a new robots.txt in the back office. At the moment the search engines are being allowed to crawl everything but should be excluded from all the following: # Directories Disallow: /classes/ Disallow: /config/ Disallow: /download/ Disallow: /mails/ Disallow: /modules/ Disallow: /translations/ Disallow: /tools/ Disallow: /lang-en/ # Files Disallow: /addresses.php Disallow: /address.php Disallow: /authentication.php Disallow: /cart.php Disallow: /discount.php Disallow: /footer.php Disallow: /get-file.php Disallow: /header.php Disallow: /history.php Disallow: /identity.php Disallow: /images.inc.php Disallow: /init.php Disallow: /my-account.php Disallow: /order.php Disallow: /order-opc.php Disallow: /order-slip.php Disallow: /order-detail.php Disallow: /order-follow.php Disallow: /order-return.php Disallow: /order-confirmation.php Disallow: /pagination.php Disallow: /password.php Disallow: /pdf-invoice.php Disallow: /pdf-order-return.php Disallow: /pdf-order-slip.php Disallow: /product-sort.php Disallow: /search.php Disallow: /statistics.php Disallow: /attachment.php Disallow: /guest-tracking Disallow: /*orderby= Disallow: /*orderway= Disallow: /*tag= Disallow: /*id_currency= Disallow: /*search_query= Disallow: /*id_lang= Disallow: /*back= Disallow: /*utm_source= Disallow: /*utm_medium= Disallow: /*utm_campaign= Disallow: /*n= Note they are crawling & trying to index files ending "id_lang=" in Link to comment Share on other sites More sharing options...
mowax Posted October 9, 2011 Author Share Posted October 9, 2011 Thanks Dazzza, this sounds promising. I haven't got round to updating prestashop yet and v1.1 doesn't have a robots.txt generator so I'm just going to paste the above into a text file. Do I specify User-agents=* for all the above? Link to comment Share on other sites More sharing options...
dazzza Posted October 10, 2011 Share Posted October 10, 2011 Full robots.txt should be as follows: # robots.txt automaticaly generated by PrestaShop e-commerce open-source solution # http://www.prestashop.com - http://www.prestashop.com/forums # This file is to prevent the crawling and indexing of certain parts # of your site by web crawlers and spiders run by sites like Yahoo! # and Google. By telling these "robots" where not to go on your site, # you save bandwidth and server resources. # For more information about the robots.txt standard, see: # http://www.robotstxt.org/wc/robots.html User-agent: * # Directories Disallow: /classes/ Disallow: /config/ Disallow: /download/ Disallow: /mails/ Disallow: /modules/ Disallow: /translations/ Disallow: /tools/ Disallow: /lang-en/ # Files Disallow: /addresses.php Disallow: /address.php Disallow: /cart.php Disallow: /discount.php Disallow: /footer.php Disallow: /get-file.php Disallow: /header.php Disallow: /history.php Disallow: /identity.php Disallow: /images.inc.php Disallow: /init.php Disallow: /my-account.php Disallow: /order.php Disallow: /order-slip.php Disallow: /order-detail.php Disallow: /order-follow.php Disallow: /order-return.php Disallow: /order-confirmation.php Disallow: /pagination.php Disallow: /password.php Disallow: /pdf-invoice.php Disallow: /pdf-order-return.php Disallow: /pdf-order-slip.php Disallow: /product-sort.php Disallow: /search.php Disallow: /statistics.php # Sitemap Sitemap: http://www.dura-tex.co.uk/sitemap.xml Link to comment Share on other sites More sharing options...
mowax Posted October 11, 2011 Author Share Posted October 11, 2011 I uploaded the new robots.txt file, but the old errors are still there, plus now there are 25 'URL restricted by robots.txt' errors! How is that even technically an error?? Do I need to wait a bit longer for the errors to disappear or have I just made the problem worse? Link to comment Share on other sites More sharing options...
mowax Posted October 11, 2011 Author Share Posted October 11, 2011 An example of the new 'URL restricted by robots.txt' errors is http://www.dura-tex.co.uk/31--fixing-underlay?n=10&id_category=31 URL restricted by robots.txt Link to comment Share on other sites More sharing options...
dazzza Posted October 12, 2011 Share Posted October 12, 2011 I've just checked your robots.txt file & it's not the same as the one in my post #7. It's the original one with the lines: Disallow: /*orderby= Disallow: /*orderway= Disallow: /*tag= Disallow: /*id_currency= Disallow: /*search_query= Disallow: /*id_lang= Disallow: /*back= Disallow: /*utm_source= Disallow: /*utm_medium= Disallow: /*utm_campaign= Disallow: /*n= at the end. Your link above to the URL restricted by robots.txt: "http://www.dura-tex.co.uk/31--fixing-underlay?n=10&id_category=31" is being restricted because of the line "Disallow: /*n=" It's using the * which denotes wildcard, so every url with "n=" is being restricted! Change your robots.txt to the one in post #7, then you'll just have to wait for Google to recrawl & index the correct URLs. You're right in saying Google flags 'URL restricted by robots.txt' as errors, but they are not errors just restrictions. Link to comment Share on other sites More sharing options...
CartExpert.net Posted October 12, 2011 Share Posted October 12, 2011 It seems you have a badly configured .htaccess, the links you posted redirect to non-existent domains, or bad links. Also the first 3 links do not comply with Prestashop's friendly URLs. http://www.dura-tex.co.uk/28--gym-flooring all the product links are wrong on this page. Did you change any files? Link to comment Share on other sites More sharing options...
mowax Posted October 12, 2011 Author Share Posted October 12, 2011 Cheers Dazzza, that my robots.txt hopefully sorted. Its definately time I updated prestashop, then I won't have to keep doing things manually and getting it wrong. Thank you for your reply CartExpert.net. My .htaccess file was also done manually. When you say that the links don't comply with friendly URLs, do you mean that the URLs should not include the product category? I had assigned these products to default categories in the BO. Should the default category be 'home' for all products when friendly URLs is turned on? Link to comment Share on other sites More sharing options...
CartExpert.net Posted October 12, 2011 Share Posted October 12, 2011 If done manually, you should generate it automatically. When trying to view: http://www.dura-tex.....html&id_lang=2 it is redirected to http://www.41-.html/ which suggests a bad redirect. Do remember, regular expressions require experience and the smallest mistake can cause big problems (like the one you have). Also, your sitemap is incorrect, so Google will keep getting the incorrect links when it's downloading the sitemap. Link to comment Share on other sites More sharing options...
mowax Posted October 14, 2011 Author Share Posted October 14, 2011 Thanks for all your help guys, I'm going to have to update prestashop to sort these issues as v1.1 doesn't have all the generator functions so everything has been done manually, and by all accounts badly. Just need to find a free weekend to do it, as I modified the default theme so have to do it from scratch Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now