robots.txt file is the mechanism almost all search engines use to allow website administrators to tell the bots what they would like indexed. By adding this file to your web root, you can forbid search engine bots to index certain parts of your website. Example: see the drupal.org robots.txt.
A robots.txt is included with Drupal 5.x. and newer versions, though there are SEO problems with Drupal's default robots.txt file even in Drupal 7. If you want to create a custom robots.txt file, please follow the instructions below. For more details check http://www.robotstxt.org.
Create a file containing the content as shown below and call it "robots.txt". Lines beginning with the pound ("#") sign are comments and can be deleted.
# Small robots.txt # More information about this file can be found at # <a href="http://www.robotstxt.org/">http://www.robotstxt.org/</a> # In case your drupal site is in a subdirectory of your web root (e.g. /drupal) # add the name of this directory before the / (slash) below # example: Disallow: /drupal/aggregator # to stop a polite robot indexing an example dir # add a line like: user-agent: polite-bot # and: Disallow: /example-dir/ # Paths (clean URLs) User-agent: * Crawl-Delay: 10 Disallow: /aggregator Disallow: /tracker Disallow: /comment/reply Disallow: /node/add Disallow: /search/ Disallow: /book/print Disallow: /logout Disallow: /user/register Disallow: /user/password Disallow: /user/login # Paths (no clean URLs) User-agent: * Crawl-Delay: 10 Disallow: /?q=aggregator Disallow: /?q=tracker Disallow: /?q=comment/reply Disallow: /?q=node/add Disallow: /?q=user/register Disallow: /?q=user/password Disallow: /?q=user/login Disallow: /?q=search/ Disallow: /?q=book/print
The code above instructs search engine bots to avoid pages that contain content that is meant only for users, for instance the search page, or the add comment pages.
A common SEO problem on Drupal sites is that search engines will index URL parameters that should not be indexed. The wildcard (*) is not an official part of the robots.txt standard, but Google and Bing will obey it. Most Drupal sites should include these rules in the robots.txt file:
# Blocks user "track" pages Disallow: /*/track$ # Blocks common URL parameters created by the Views module on tables Disallow: /*sort= Disallow: /*size=
Some bots obey the "Crawl-delay:" parameter. Since Drupal sites seem to be popular with search engines and lots of people have more aggressive bots than visitors at their site, it might be wise to slow down the robots by adding a line like this to your robots.txt:
User-Agent: * Crawl-Delay: 10
10 is the delay in seconds between page requests.
Both "Slurp" (Yahoo's and altaVista's bot) and the Microsoft bots for Live Search obey this parameter. Googlebot does not use the "crawl-delay" parameter yet but will likely do so in an upcoming version. (You can, however, control the crawl rate used by Googlebot via their Webmaster Tools Home page.)
Change the file as you wish and save it. Now upload it to your webserver and make sure you put it into your web root. If you have installed Drupal in a subdirectory (for example
/drupal), then change the URLs in robots.txt, but place the file in your web root anyway and not in Drupal's root folder.
Now watch the robots visit your site and after some time, monitor your log files ("referrer log") to see how many visitors came from a search engine.
If you are using a multi-site setup and you want to control robot setting for each site individually, you will not be able to use robots.txt. Please use the RobotsTxt module instead.