1. Using robots.txt to Limit Search Engine Indexing

How to block search engines from indexing specific directories on your web site? Very simple. Create a file called robots.txt and add the following lines:

User-agent: *
Disallow: /images-directory/

Next save the robots.txt file in your web root and you’re done. Piece of cake, huh?

Please note, this is neither a security nor an access control mechanism. The robots.txt file is saved in your web root and hence can be accessed by the whole world. As a best practice, files which are to be kept private should not be stored on the Internet. If they must be, encrypt them and store them in a folder which requires user authentication and access control.

2. Using robots.txt to ALLOW Search Engine Indexing By Specific Bots

The first directive in the below robots.txt file blocks all bots from accessing our images-directory. However, the next directive provides an exception which allows the twitter-card bot to index this directory:

User-agent: *
Disallow: /images-directory/

User-agent: Twitterbot
Allow: /images-directory/

Adding Successive Directives to robots.txt

Note that we can add successive directives by entering a physical new-line after each Allow or Disallow.

Twitter Summary Card and Images

If Twitter summary cards are not showing your images, the most likely culprit is a highly restrictive robots.txt file on the folder where your image are stored.

3. Use robots.txt to DISALLOW a Specific WordPress Post

The format in the example below assumes you either have pretty URLs set-up and want to exclude a specific WordPress page or post. In the robtos.txt file below, the WordPress post we wanted to exclude has a URL slug of wedding-anniversary-photos:

User-agent: *
Disallow: /images-directory/

User-agent: Twitterbot
Allow: /images-directory/

User-agent: *
Disallow: /wedding-anniversary-photos/

Refer to the Google Webmaster Tools link on URL matching based on path values to further customize file exclusion directives.

4. To Learn More About robots.txt Consult These References

  1. A table from Google on URL matching based on path values. Scroll to bottom or just do a CTRL+F search on this phrase
  2. Google’s webmaster support site
  3. This short and quick cheat sheet from Jacob Kristensen’s Brugbart
  4. A more detailed cheat sheet from “The Site Wizard”