cee oh em gee tee - "commgitt"

Web Robots
Search Robots


HTML Documents

Uncomment / add:

<meta name="robots" content="NOINDEX, NOFOLLOW">
<META name="ROBOTS" content="NONE">  # same as above

More details, from :

1.  ROBOTS meta-tag

	<META name="ROBOTS"
	          content="ALL | NONE | NOINDEX | NOFOLLOW">

	default = empty = "ALL"

        The filler is a comma separated list of terms:

        Discussion: This tag is meant to provide users who cannot control
        the robots.txt file at their sites.  It provides a last chance to
        keep their content out of search services.  It was decided not to
        add syntax to allow robot specific permissions within the meta-tag.

        INDEX means that robots are welcome to include this page in
        search services. 

        FOLLOW means that robots are welcome to follow links from this
        page to find other pages.

        So a value of "NOINDEX" allows the subsidiary links to be explored,
        even though the page is not indexed.  A value of "NOFOLLOW" allows the
        page to be indexed, but no links from the page are explored (this may
        be useful if the page is a free entry point into pay-per-view content,
        for example.  A value of "NONE" tells the robot to ignore the page.

/robots.txt File

A plain text file placed at the root of the web (web document root, not server's root, or anything else).

Simplest example, disallow all actions by robots:

User-agent: *
Disallow: /

A more useful, to allow your own search engine to disccard certain areas:

# /robots.txt file for my own site  # comments starts with '#'
User-agent: webcrawler
Disallow:            # empty value = all URLs can be retrieved
User-agent: lycra
Disallow: /          # disallows the whole site
User-agent: *
Disallow: /tmp
Disallow: /logs

More information in (e.g.) A Standard for Robot Exclusion.