Skip to main content
Free tool

Robots.txt analyzer

Paste your robots.txt and get a plain-English breakdown of what each rule actually does.

Or fetch yours from https://yourdomain.com/robots.txt and paste here. Nothing is sent to our servers — parsing happens in your browser.

Findings
No issues spotted. This robots.txt looks clean.
What each group does, in plain English
All crawlers (User-agent: *)
  • L2Blocks any URL starting with '/admin/'.
  • L3Blocks any URL starting with '/api/'.
  • L4Explicitly allows '/api/public/', overriding any matching Disallow above.
  • L5Blocks any URL ending exactly in '/*.pdf'. The $ anchors the match to the URL's end.
Googlebot
  • L8Blocks any URL starting with '/private/'.

What does robots.txt actually do?

robots.txt is a small plain-text file at the root of your domain that tells web crawlers which paths they're allowed to fetch. It's a polite request, not a hard wall — well-behaved crawlers (Google, Bing, Anthropic, OpenAI) respect it, but malicious scrapers ignore it.

It's also unforgiving: a single character can take your whole site out of search results. The most catastrophic configuration — User-agent: * followed by Disallow: / — is one deploy from de-indexing your entire production site.

The directives that actually matter

  • User-agent: names the crawler the rules apply to. * means “all crawlers”. You can target specific ones — Googlebot, Bingbot, GPTBot, PerplexityBot — when you want different rules per crawler.
  • Disallow: the path crawlers should not fetch. Anything starting with this string is blocked. /admin/ blocks /admin/users, /admin/login, etc.
  • Allow: override a Disallow. Useful when you block a whole directory but want to permit one path inside — Disallow: /api/ + Allow: /api/public/.
  • Sitemap: the full URL of your XML sitemap. Adding it helps crawlers find and prioritize your pages.
  • Crawl-delay: mostly obsolete. Google ignores it entirely; Bing and Yandex still respect it. Drop it unless your hosting genuinely can't keep up.

Common mistakes the analyzer flags

  1. Disallow: / on User-agent: * — site gets de-indexed. Usually a leftover from staging.
  2. Blocking JS or CSS (e.g. Disallow: /*.js$) — Google can't render the page properly, mobile-friendly testing fails, and rich results break.
  3. Duplicate User-agent groups — declaring rules twice for the same crawler. Crawlers often respect only the first group; the second is ignored.
  4. Rules before any User-agent — orphaned directives are ignored entirely.
  5. Missing Sitemap directive — minor, but you should always point crawlers at your sitemap.

FAQ

Where does robots.txt go?

At the root of your domain — https://yourdomain.com/robots.txt. Subdomains need their own file at their own root (https://blog.yourdomain.com/robots.txt). Crawlers do not look anywhere else.

Does robots.txt prevent pages from showing in Google?

Not reliably. It prevents crawling, but if a page is linked from elsewhere, Google can still index the URL based on the link context — just without your page content. To guarantee a page is excluded from search results, use a noindex meta tag instead (or an X-Robots-Tag header).

Should I block AI crawlers like GPTBot and PerplexityBot?

Depends on your goals. Blocking them keeps your content out of model training and out of AI search citations — which usually hurts AI Search visibility more than it helps. Most sites should let them in. Block selectively if you have legal/IP reasons or if your business model depends on people landing on your pages directly.

What's the difference between Disallow and noindex?

Disallow tells crawlers 'don't fetch this URL'. noindex tells crawlers 'fetch this URL but don't show it in search results'. Use noindex when you want pages off Google but still want crawlers to follow links from them. Use Disallow for paths you really don't want crawled at all (private APIs, admin panels).

How long until Google picks up changes to my robots.txt?

Google re-fetches robots.txt roughly every 24 hours. You can force a refresh by using the robots.txt Tester in Search Console (Settings → Crawling).

Can I have wildcards in robots.txt?

Yes. * matches any sequence of characters, $ anchors to end of URL. Disallow: /*.pdf$ blocks every PDF anywhere on the site. Disallow: /search?* blocks every URL starting with /search?. Most modern crawlers support both.

Get the full Web Design Service workspace.

Want this monitored daily across all your projects? Sign up free.

No credit card required· 14-day trial