DavidSidecar Posted May 22, 2017 Share Posted May 22, 2017 Hi there! We are starting to index a Prestashop site with 1300 products. The products was published but not optimized for SEO, all products must be navigate but not indexed. When the products (all products) of a subcategory are optimized, this category would be indexed. I write a robots.txt below Allow: https://nutrevital.com/ Allow: https://nutrevital.com/proteinas/* Allow: https://nutrevital.com/aminoacidos/* Allow: https://nutrevital.com/carbohidratos/* Allow: https://nutrevital.com/quemadores-de-grasa/* Allow: https://nutrevital.com/alimantacion natural/* Allow: https://nutrevital.com/barritas/* Allow: https://nutrevital.com/bebidas/* Allow: https://nutrevital.com/complementos/* Allow: https://nutrevital.com/cosmetica/* Allow: https://nutrevital.com/protectores-y-generadores/* Allow: https://nutrevital.com/creatinas/* Allow: https://nutrevital.com/energeticos/* Allow: https://nutrevital.com/estimulantes-y-precursores/* Allow: https://nutrevital.com/vitaminas-y-minerales/* Allow: https://nutrevital.com/voluminizadores-y-pre-entreno/* Disallow: https://nutrevital.com/* Is it correct? I dont care if any cms page or other kind of pages is not indexed, this a temporary measure, when all products will be optimized for SEO this problem will be solved. Regards!!!! Link to comment Share on other sites More sharing options...
Johann Posted May 23, 2017 Share Posted May 23, 2017 Robots.txt is made to control crawl, not indexation ! You have to insert a meta robots 'noindex' tag in your product.tpl file. Or use a module as those il propose on my site Link to comment Share on other sites More sharing options...
Knowband Plugins Posted May 26, 2017 Share Posted May 26, 2017 Hi there! We are starting to index a Prestashop site with 1300 products. The products was published but not optimized for SEO, all products must be navigate but not indexed. When the products (all products) of a subcategory are optimized, this category would be indexed. I write a robots.txt below Allow: https://nutrevital.com/ Allow: https://nutrevital.com/proteinas/* Allow: https://nutrevital.com/aminoacidos/* Allow: https://nutrevital.com/carbohidratos/* Allow: https://nutrevital.com/quemadores-de-grasa/* Allow: https://nutrevital.com/alimantacion natural/* Allow: https://nutrevital.com/barritas/* Allow: https://nutrevital.com/bebidas/* Allow: https://nutrevital.com/complementos/* Allow: https://nutrevital.com/cosmetica/* Allow: https://nutrevital.com/protectores-y-generadores/* Allow: https://nutrevital.com/creatinas/* Allow: https://nutrevital.com/energeticos/* Allow: https://nutrevital.com/estimulantes-y-precursores/* Allow: https://nutrevital.com/vitaminas-y-minerales/* Allow: https://nutrevital.com/voluminizadores-y-pre-entreno/* Disallow: https://nutrevital.com/* Is it correct? I dont care if any cms page or other kind of pages is not indexed, this a temporary measure, when all products will be optimized for SEO this problem will be solved. Regards!!!! Hi, Here I am giving you suggestions as per my understanding. Your query is, you don't want to index your product pages in Google search. However, they should be followed by Google bot and should be functional. Also, you want that all category pages should be indexed. Firstly, Disallow: https://nutrevital.com/* instead of this use the following: Allow : https://nutrevital.com/* If you use the above-mentioned syntax (Disallow: https://nutrevital.com/*) then all the URLs of the website except home page will be deindexed. Even, you should use this format of robots.txt file: User-agent: * Disallow: /category name/* Because all the product pages URL of the above-mentioned website are domain/category/product name Apart from that, here is another suggestion. Just use the following code on all the product pages: <meta name="robots" content="NOINDEX,FOLLOW" /> I hope this helps. Link to comment Share on other sites More sharing options...
Johann Posted May 26, 2017 Share Posted May 26, 2017 If you use the above-mentioned syntax (Disallow: https://nutrevital.com/*) then all the URLs of the website except home page will be deindexed. WROOOOONG !!!! URLs won't be crawled anymore, that's all ! That means that even if a robots noindex tag is included in a page, it won't be read as the page won't be crawled. As I told previously, the robots.txt file is just to control crawling,not to deindex URLs. To do that, the above meta robot tag must be present in the page source AND the page must be crawlable (ie not disallowed) 1 Link to comment Share on other sites More sharing options...
DavidSidecar Posted May 29, 2017 Author Share Posted May 29, 2017 Robots.txt is made to control crawl, not indexation ! You have to insert a meta robots 'noindex' tag in your product.tpl file. Or use a module as those il propose on my site Hi Johann !! Thanks for your time. I installed "NO INDEX NO FOLLOW" module, to set "no index" in some product page I read about tag in product.tpl. So, do you think they're in the way to make it possible using robots.txt? I'm gonna write the status and develop of this "new way", maybe we discover anything xD! Regards from Spain!! Link to comment Share on other sites More sharing options...
DavidSidecar Posted May 29, 2017 Author Share Posted May 29, 2017 (edited) Hi, Here I am giving you suggestions as per my understanding. Your query is, you don't want to index your product pages in Google search. However, they should be followed by Google bot and should be functional. Also, you want that all category pages should be indexed. Firstly, Disallow: https://nutrevital.com/* instead of this use the following: Allow : https://nutrevital.com/* If you use the above-mentioned syntax (Disallow: https://nutrevital.com/*) then all the URLs of the website except home page will be deindexed. Even, you should use this format of robots.txt file: User-agent: * Disallow: /category name/* Because all the product pages URL of the above-mentioned website are domain/category/product name Apart from that, here is another suggestion. Just use the following code on all the product pages: <meta name="robots" content="NOINDEX,FOLLOW" /> I hope this helps. Hi Knowband Plugins Thanks for your concern. Sounds interesting (logical and consequent) your suggestion about adjust this robots.txt lines. I'm gonna try it. Regards from Spain!! Edited May 29, 2017 by DavidSidecar (see edit history) Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now