Useful robots.txt Rules

Useful rules

Disallow crawling of the entire site

Keep in mind that in some situations URLs from the site may still be indexed, even if they haven't been crawled.

User-agent: *
Disallow: /

Disallow crawling of a directory and its contents

Append a forward slash to the directory name to disallow crawling of a whole directory.

User-agent: *
Disallow: /calendar/
Disallow: /junk/
Disallow: /books/fiction/contemporary/

Allow access to a single crawler

Only googlebot-news may crawl the whole site.

User-agent: Googlebot-news
Allow: /

User-agent: *
Disallow: /

Allow access to all but a single crawler

Unnecessarybot may not crawl the site, all other bots may.

User-agent: Unnecessarybot
Disallow: /

User-agent: *
Allow: /

Disallow crawling of a single web page

For example, disallow the useless_file.html page located at https://example.com/useless_file.html , and other_useless_file.html in the junk directory.

User-agent: *
Disallow: /useless_file.html
Disallow: /junk/other_useless_file.html

Disallow crawling of the whole site except a subdirectory

Crawlers may only access the public subdirectory.

User-agent: *
Disallow: /
Allow: /public/

Block a specific image from Google Images

For example, disallow the dogs.jpg image.

User-agent: Googlebot-Image
Disallow: /images/dogs.jpg

Block all images on your site from Google Images

Google can't index images and videos without crawling them.

User-agent: Googlebot-Image
Disallow: /

Disallow crawling of files of a specific file type

For example, disallow for crawling all .gif files.

User-agent: Googlebot
Disallow: /*.gif$

Disallow crawling of an entire site, but allow Mediapartners-Google

This implementation hides your pages from search results, but the Mediapartners-Google web crawler can still analyze them to decide what ads to show visitors on your site.

User-agent: *
Disallow: /

User-agent: Mediapartners-Google
Allow: /

Use the * and $ wildcards to match URLs that end with a specific string

For example, disallow all .xls files.

User-agent: Googlebot
Disallow: /*.xls$