- The Web Robots Pages www.robotstxt.org/wc/robots.html
with e.g. (more exist):
- Web Robots FAQ www.robotstxt.org/wc/faq.html
- www.robotstxt.org - the main source for information on the robots.txt, Robots Exclusion Standard and other articles about writing well-behaved Web robots.
Uncomment / add:
<head> ... <meta name="robots" content="NOINDEX, NOFOLLOW"> <META name="ROBOTS" content="NONE"> # same as above ... </head>
More details, from www.w3.org/Search/9605-Indexing-Workshop/ReportOutcomes/Spidering.txt :
1. ROBOTS meta-tag <META name="ROBOTS" content="ALL | NONE | NOINDEX | NOFOLLOW"> default = empty = "ALL" "NONE" = "NOINDEX, NOFOLLOW" The filler is a comma separated list of terms: ALL, NONE, INDEX, NOINDEX, FOLLOW, NOFOLLOW. Discussion: This tag is meant to provide users who cannot control the robots.txt file at their sites. It provides a last chance to keep their content out of search services. It was decided not to add syntax to allow robot specific permissions within the meta-tag. INDEX means that robots are welcome to include this page in search services. FOLLOW means that robots are welcome to follow links from this page to find other pages. So a value of "NOINDEX" allows the subsidiary links to be explored, even though the page is not indexed. A value of "NOFOLLOW" allows the page to be indexed, but no links from the page are explored (this may be useful if the page is a free entry point into pay-per-view content, for example. A value of "NONE" tells the robot to ignore the page.
A plain text file placed at the root of the web (web document root, not server's root, or anything else).
Simplest example, disallow all actions by robots:
User-agent: * Disallow: /
A more useful, to allow your own search engine to disccard certain areas:
# /robots.txt file for my own site # comments starts with '#' User-agent: webcrawler Disallow: # empty value = all URLs can be retrieved User-agent: lycra Disallow: / # disallows the whole site User-agent: * Disallow: /tmp Disallow: /logs
More information in (e.g.) A Standard for Robot Exclusion.