Jump to content

[SOLVED] Crawl errors in google webmaster tools


Recommended Posts

Hi,

 

So I have quite a few crawl errors showing up in my webmaster tools. I have 8 HTTP errors which I have pasted below. I guess it is something to do with language packs but I don't have any installed.

 

URL Detail Detected

url_icon.gifhttp://www.dura-tex.co.uk/-anti-fatigue-mats/41-coin-grip-rubber-matting-6mm.html&id_lang=2 Domain name not found Sep 9, 2011

url_icon.gifhttp://www.dura-tex.co.uk/-carpets-and-rugs/28-red-carpet-runner.html&id_lang=2 Domain name not found Sep 9, 2011

url_icon.gifhttp://www.dura-tex.co.uk/-anti-fatigue-mats/39-interlocking-rubber-mat.html&id_lang=2 Domain name not found Sep 8, 2011

url_icon.gifhttp://www.dura-tex.co.uk/64-double-sided-reinforced-tape.html&id_lang=1 Domain name not found Sep 5, 2011

url_icon.gifhttp://www.dura-tex.co.uk/63-heavy-duty-adhesive-spray.html&id_lang=1 Domain name not found Sep 5, 2011

url_icon.gifhttp://www.dura-tex.co.uk/53-luxury-soft-pile-carpet-tiles.html&id_lang=1 Domain name not found Sep 5, 2011

url_icon.gifhttp://www.dura-tex.co.uk/52-loop-pile-carpet-tiles.html&id_lang=1 Domain name not found Sep 5, 2011

url_icon.gifhttp://www.dura-tex.co.uk/48-swimming-pool-matting-swimming-pool-flooring.html&id_lang=1 Domain name not found Sep 5, 2011

 

 

 

 

 

I also have 29 Not found errors, a typical example is this:

 

http://www.dura-tex....jM=Ub4UnkNbVuM= 404 (Not found)

 

I don't really know what to make of that, its just gibberish!

 

 

Can somebody please lend me a hand with this? I'm guessing website architecture is pretty crucial to good rankings.

 

Thanks for any replies

Link to comment
Share on other sites

I can't open a good number of those pages either.

 

It looks like at least some of these links are URL forwarded to a new link - that is the product URL minus the domain URL.

EG http://www.64-double-sided-reinforced-tape.html/

 

Or in some cases forwarding to the product number only.

EG http://www.41-.html/

 

So I would say this is an issue or set-up error that is based in your hosting, rather than Prestashop. Quite a complex one though. Talk to your host is my advice. I don't see any way that P'shop could be doing this.

 

 

Best of luck and do come back to here and tell us what the problem was when you get it sorted.

Link to comment
Share on other sites

Thanks for your reply Craig, I will check with bluehost.com and see what they can do. But I don't understand why there are &id_lang=2 variables in there, as I only have the english language pack installed, which is &id_lang=1? How can they be crawling pages that don't (or shouldn't) exist?

Link to comment
Share on other sites

Your robots.txt is only showing a link to your sitemap. Try regenerating a new robots.txt in the back office.

At the moment the search engines are being allowed to crawl everything but should be excluded from all the following:

 

# Directories

Disallow: /classes/

Disallow: /config/

Disallow: /download/

Disallow: /mails/

Disallow: /modules/

Disallow: /translations/

Disallow: /tools/

Disallow: /lang-en/

# Files

Disallow: /addresses.php

Disallow: /address.php

Disallow: /authentication.php

Disallow: /cart.php

Disallow: /discount.php

Disallow: /footer.php

Disallow: /get-file.php

Disallow: /header.php

Disallow: /history.php

Disallow: /identity.php

Disallow: /images.inc.php

Disallow: /init.php

Disallow: /my-account.php

Disallow: /order.php

Disallow: /order-opc.php

Disallow: /order-slip.php

Disallow: /order-detail.php

Disallow: /order-follow.php

Disallow: /order-return.php

Disallow: /order-confirmation.php

Disallow: /pagination.php

Disallow: /password.php

Disallow: /pdf-invoice.php

Disallow: /pdf-order-return.php

Disallow: /pdf-order-slip.php

Disallow: /product-sort.php

Disallow: /search.php

Disallow: /statistics.php

Disallow: /attachment.php

Disallow: /guest-tracking

Disallow: /*orderby=

Disallow: /*orderway=

Disallow: /*tag=

Disallow: /*id_currency=

Disallow: /*search_query=

Disallow: /*id_lang=

Disallow: /*back=

Disallow: /*utm_source=

Disallow: /*utm_medium=

Disallow: /*utm_campaign=

Disallow: /*n=

 

 

Note they are crawling & trying to index files ending "id_lang=" in

Link to comment
Share on other sites

Thanks Dazzza, this sounds promising. I haven't got round to updating prestashop yet and v1.1 doesn't have a robots.txt generator so I'm just going to paste the above into a text file. Do I specify User-agents=* for all the above?

Link to comment
Share on other sites

Full robots.txt should be as follows:

# robots.txt automaticaly generated by PrestaShop e-commerce open-source solution
# http://www.prestashop.com - http://www.prestashop.com/forums
# This file is to prevent the crawling and indexing of certain parts
# of your site by web crawlers and spiders run by sites like Yahoo!
# and Google. By telling these "robots" where not to go on your site,
# you save bandwidth and server resources.
# For more information about the robots.txt standard, see:
# http://www.robotstxt.org/wc/robots.html
User-agent: *
# Directories
Disallow: /classes/
Disallow: /config/
Disallow: /download/
Disallow: /mails/
Disallow: /modules/
Disallow: /translations/
Disallow: /tools/
Disallow: /lang-en/
# Files
Disallow: /addresses.php
Disallow: /address.php
Disallow: /cart.php
Disallow: /discount.php
Disallow: /footer.php
Disallow: /get-file.php
Disallow: /header.php
Disallow: /history.php
Disallow: /identity.php
Disallow: /images.inc.php
Disallow: /init.php
Disallow: /my-account.php
Disallow: /order.php
Disallow: /order-slip.php
Disallow: /order-detail.php
Disallow: /order-follow.php
Disallow: /order-return.php
Disallow: /order-confirmation.php
Disallow: /pagination.php
Disallow: /password.php
Disallow: /pdf-invoice.php
Disallow: /pdf-order-return.php
Disallow: /pdf-order-slip.php
Disallow: /product-sort.php
Disallow: /search.php
Disallow: /statistics.php
# Sitemap
Sitemap: http://www.dura-tex.co.uk/sitemap.xml

Link to comment
Share on other sites

I uploaded the new robots.txt file, but the old errors are still there, plus now there are 25 'URL restricted by robots.txt' errors! How is that even technically an error?? Do I need to wait a bit longer for the errors to disappear or have I just made the problem worse?

Link to comment
Share on other sites

I've just checked your robots.txt file & it's not the same as the one in my post #7. It's the original one with the lines:

Disallow: /*orderby=

Disallow: /*orderway=

Disallow: /*tag=

Disallow: /*id_currency=

Disallow: /*search_query=

Disallow: /*id_lang=

Disallow: /*back=

Disallow: /*utm_source=

Disallow: /*utm_medium=

Disallow: /*utm_campaign=

Disallow: /*n=

 

at the end.

 

Your link above to the URL restricted by robots.txt: "http://www.dura-tex.co.uk/31--fixing-underlay?n=10&id_category=31" is being restricted because of the line "Disallow: /*n="

 

It's using the * which denotes wildcard, so every url with "n=" is being restricted!

 

Change your robots.txt to the one in post #7, then you'll just have to wait for Google to recrawl & index the correct URLs.

 

You're right in saying Google flags 'URL restricted by robots.txt' as errors, but they are not errors just restrictions.

Link to comment
Share on other sites

Cheers Dazzza, that my robots.txt hopefully sorted.

 

Its definately time I updated prestashop, then I won't have to keep doing things manually and getting it wrong.

 

 

Thank you for your reply CartExpert.net. My .htaccess file was also done manually.

 

When you say that the links don't comply with friendly URLs, do you mean that the URLs should not include the product category? I had assigned these products to default categories in the BO. Should the default category be 'home' for all products when friendly URLs is turned on?

Link to comment
Share on other sites

If done manually, you should generate it automatically.

 

When trying to view: http://www.dura-tex.....html&id_lang=2 it is redirected to http://www.41-.html/ which suggests a bad redirect.

 

Do remember, regular expressions require experience and the smallest mistake can cause big problems (like the one you have).

 

Also, your sitemap is incorrect, so Google will keep getting the incorrect links when it's downloading the sitemap.

Link to comment
Share on other sites

Thanks for all your help guys, I'm going to have to update prestashop to sort these issues as v1.1 doesn't have all the generator functions so everything has been done manually, and by all accounts badly. Just need to find a free weekend to do it, as I modified the default theme so have to do it from scratch :mellow:

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...