Add a Robots.txt File to Customize How Your Site is Crawled by Search Engines
A Robots.txt file instructs web robots (such as search engine crawlers) how to crawl the pages on your website. You use it to tell web crawling software what parts of your website to crawl or not to crawl. You can use it to block a page, a subdirectory, or your entire site.
Note: this is different than a sitemap.xml file, which helps web crawlers and search engines find and index the pages on your site.
Why would you want to use a robots.txt file?
In general, most agents want search engines to index their website. There are a few specific reasons why you may want to exclude specific pages:
1. You don't want your site to be indexed until you've done a compliance review.
2. You want to keep specific pages, such as "agent only" or "for existing customers" pages, hidden unless you direct people to them.
3. You have duplicate pages on your site and you want to make sure one page ranks.
How to set up your robots.txt file
First, you only need a robots.txt file if you want to exclude pages from being crawled. Second, we've done our best to make this easy, but this is a technical aspect of your site and small mistakes may lead to unintended consequences. When in doubt, reach out to us and we'll help.
To set up your robots.txt file, follow these steps:
1. Go to your AgentMethods site settings page
2. Scroll to the bottom and look for "Custom Robots File"
3. Select "allow", "block", or "custom".
4. For "custom", refer to Google's Robots.txt documentation on how to specific certain pages or directors.
5. Click "Save Site Settings"
Be aware some crawlers may choose to ignore your robots.txt file. This is especially common with more nefarious crawlers like malware robots or email address scrapers.