Jump to content

robots.txt - do you take the PS default or add more to it?


Recommended Posts

hi, my robots.txt is already pretty long but I realized that many but not all directories are added. It says for example

# Directories
Disallow: /classes/
Disallow: /config/



but what about folders like /js, /docs or /css. Are this not added by purpose or wouldn't it make sense to add these too to the dissallow list? or e.g. my custom /cms directory with pdf files for terms & conditions (for download).

also, read somewhere the tip to disallow specific crawlers completely, e.g.

User-agent: EmailCollector
Disallow: /

User-agent: GagaRobot
Disallow: /



but where in the robots.txt would I put this? anywhere? I am just wondering if they need to come before the part with

User-agent: *



or after. means, are the exclusions really excluded if I add them at the top opf my robots.txt and then some lines further down allow all crawlers again? or are they still excluded?

thanks
phil

Link to comment
Share on other sites

Hello Pippo, have a look at this page, it helps a lot...


to some extent, yes. but let me rephrase my question, can I - as a limited saftey feature - disallow all folder and subfolder for crawlers, i.e. also those the PS generator does not include in the default robots.txt? I mean if I generate and submit a siremap to google, is that enough to get my site into google? or where are e.g. the friendly URLs 'stored', i.e. which subfolder do I need to leave open/allow to ensure that google crawls and lists my site?

best
phil
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...