Useful robots.txt rules
Here are some common useful robots.txt rules:
Keep in mind that in some situations URLs from the site may still be indexed, even if they haven't been crawled.
User-agent: * Disallow: /
Append a forward slash to the directory name to disallow crawling of a whole directory.
User-agent: * Disallow: /calendar/ Disallow: /junk/ Disallow: /books/fiction/contemporary/
Only googlebot-news
may crawl the whole site.
User-agent: Googlebot-news Allow: / User-agent: * Disallow: /
Unnecessarybot
may not crawl the site, all other bots may.
User-agent: Unnecessarybot Disallow: / User-agent: * Allow: /
Disallow crawling of a single web page
For example, disallow the useless_file.html
page located at https://example.com/useless_file.html
, and other_useless_file.html
in the junk
directory.
User-agent: * Disallow: /useless_file.html Disallow: /junk/other_useless_file.html
Disallow crawling of the whole site except a subdirectory
Crawlers may only access the public
subdirectory.
User-agent: * Disallow: / Allow: /public/
Block a specific image from Google Images
For example, disallow the dogs.jpg
image.
User-agent: Googlebot-Image Disallow: /images/dogs.jpg
Block all images on your site from Google Images
Google can't index images and videos without crawling them.
User-agent: Googlebot-Image Disallow: /
Disallow crawling of files of a specific file type
For example, disallow for crawling all .gif
files.
User-agent: Googlebot Disallow: /*.gif$
Disallow crawling of an entire site, but allow Mediapartners-Google
This implementation hides your pages from search results, but the Mediapartners-Google
web crawler can still analyze them to decide what
ads to show visitors on your site.
User-agent: * Disallow: / User-agent: Mediapartners-Google Allow: /
*
and $
wildcards to match URLs that end with a
specific stringFor example, disallow all .xls
files.
User-agent: Googlebot Disallow: /*.xls$

