Stay organized with collectionsSave and categorize content based on your preferences.
Introduction to robots.txt
A robots.txt file tells search engine crawlers which URLs the crawler can access on your site.
This is used mainly to avoid overloading your site with requests;it is not a
mechanism for keeping a web page out of Google. To keep a web page out of Google,block indexing withnoindexor password-protect the page.
What is a robots.txt file used for?
A robots.txt file is used primarily to manage crawler traffic to your site, andusuallyto keep a file off Google, depending on the file type:
robots.txt effect on different file types
Web page
You can use a robots.txt file for web pages (HTML, PDF, or othernon-media formats that Google can read),
to manage crawling traffic if you think your server will be overwhelmed by requests
from Google's crawler, or to avoid crawling unimportant or similar pages on your site.
If your web page is blocked with a robots.txt file, its URL can still
appear in search results, but the search result willnot have a description.
Image files, video files, PDFs, and other non-HTML files embedded in the blocked page will
be excluded from crawling, too, unless they're referenced by other pages that are allowed
for crawling. If you see this search result for your page and want to fix it, remove the
robots.txt entry blocking the page. If you want to hide the page completely from Search,
useanother method.
Media file
Use a robots.txt file to manage crawl traffic, and also to prevent image, video, and
audio files from appearing in Google search results. This won't prevent other pages or
users from linking to your image, video, or audio file.
You can use a robots.txt file to block resource files such as unimportant image, script,
or style files,if you think that pages loaded without these resources will not
be significantly affected by the loss. However, if the absence of these
resources make the page harder for Google's crawler to understand the page, don't block
them, or else Google won't do a good job of analyzing pages that depend on
those resources.
Understand the limitations of a robots.txt file
Before you create or edit a robots.txt file, you should know the limits of this URL blocking
method. Depending on your goals and situation, you might want to consider other mechanisms to
ensure your URLs are not findable on the web.
robots.txt rules may not be supported by all search engines. The instructions in robots.txt files cannot enforce crawler behavior to your site; it's up
to the crawler to obey them. While Googlebot and other respectable web crawlers obey the
instructions in a robots.txt file, other crawlers might not. Therefore, if you want to keep
information secure from web crawlers, it's better to use other blocking methods, such aspassword-protecting private files on your server.
Different crawlers interpret syntax differently. Although respectable web crawlers follow the rules in a robots.txt file, each crawler
might interpret the rules differently. You should know theproper syntaxfor addressing
different web crawlers as some might not understand certain instructions.
A page that's disallowed in robots.txt can
still be indexed if linked to from other sites. While Google won't crawl or index the content blocked by a robots.txt file, we might still
find and index a disallowed URL if it is linked from other places on the web. As a result,
the URL address and, potentially, other publicly available information such as anchor text
in links to the page can still appear in Google search results. To properly prevent your URL
from appearing in Google search results,password-protect the files on your server,use thenoindexmetatag or response header,
or remove the page entirely.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-02-04 UTC."],[[["\u003cp\u003eA robots.txt file primarily manages crawler traffic to your site and can prevent specific files from appearing in Google Search results.\u003c/p\u003e\n"],["\u003cp\u003eIt's not a foolproof way to hide web pages from Google; use \u003ccode\u003enoindex\u003c/code\u003e or password protection for that purpose.\u003c/p\u003e\n"],["\u003cp\u003eWhile Googlebot respects robots.txt rules, other crawlers may not, and interpretations can vary.\u003c/p\u003e\n"],["\u003cp\u003eDisallowed pages can still be indexed if linked to from external sites, so consider alternative blocking methods for complete exclusion.\u003c/p\u003e\n"],["\u003cp\u003eCMS platforms like Wix or Blogger may offer built-in search settings instead of direct robots.txt editing.\u003c/p\u003e\n"]]],["A robots.txt file manages search engine crawler access to URLs on a site, primarily to avoid server overload. It's not for hiding pages from Google; use `noindex` or password protection for that. The robots.txt file can also manage crawl traffic for media files to prevent them from appearing in search results. However, pages disallowed in robots.txt can still be indexed if linked externally, and it may not be supported by all search engines. Consider alternative methods like `noindex` or password protection.\n"],null,[]]