robots.txt to Limit Search Engine Indexing
How to block search engines from indexing specific directories on your web site? Very simple. Create a file called
robots.txt and add the following lines:
User-agent: * Disallow: /images-directory/
Next save the robots.txt file in your web root and you’re done. Piece of cake, huh?
Please note, this is neither a security nor an access control mechanism. The
robots.txt file is saved in your web root and hence can be accessed by the whole world. As a best practice, files which are to be kept private should not be stored on the Internet. If they must be, encrypt them and store them in a folder which requires user authentication and access control.
robots.txt to ALLOW Search Engine Indexing By Specific Bots
The first directive in the below
robots.txt file blocks all bots from accessing our
images-directory. However, the next directive provides an exception which allows the twitter-card bot to index this directory:
User-agent: * Disallow: /images-directory/ User-agent: Twitterbot Allow: /images-directory/
Adding Successive Directives to
Note that we can add successive directives by entering a physical new-line after each Allow or Disallow.
Twitter Summary Card and Images
robots.txt to DISALLOW a Specific WordPress Post
The format in the example below assumes you either have pretty URLs set-up and want to exclude a specific WordPress page or post. In the
robtos.txt file below, the WordPress post we wanted to exclude has a URL slug of
User-agent: * Disallow: /images-directory/ User-agent: Twitterbot Allow: /images-directory/ User-agent: * Disallow: /wedding-anniversary-photos/
Refer to the Google Webmaster Tools link on URL matching based on path values to further customize file exclusion directives.
4. To Learn More About
robots.txt Consult These References
- A table from Google on URL matching based on path values. Scroll to bottom or just do a CTRL+F search on this phrase
- Google’s webmaster support site
- This short and quick cheat sheet from Jacob Kristensen’s Brugbart
- A more detailed cheat sheet from “The Site Wizard”