GEO Auditor
← All posts
March 24, 2026·5 min read

AI Crawler Access: How We Audit All 14 AI Crawlers

A look inside GEO Auditor's crawler engine — what it checks beyond robots.txt, the two tiers of AI crawlers, and why llms.txt and server-rendering matter for access.


Every major AI search engine operates its own web crawler. Before ChatGPT, Perplexity, or Gemini can cite your site, their crawler has to be able to read it. GEO Auditor's crawler access engine checks all 14 known AI user-agents against your site's access controls — and the results are often surprising.

What the crawlers engine checks

The audit inspects every layer that can block a crawler, not just robots.txt:

  • robots.txt rules — per-agent Allow and Disallow directives, plus blanket wildcard blocks
  • Meta robots tagsnoindex, nofollow, and AI-specific directives like noai and noimageai
  • AI files — whether llms.txt, llms-full.txt, and ai.txt are present to help AI models understand your site
  • Sitemap presence — a sitemap helps crawlers discover all your pages, not just the homepage

The 14 AI crawlers we check

Not all AI crawlers are equal. GEO Auditor groups them into two tiers:

Tier 1 — high-impact AI search crawlers: GPTBot, OAI-SearchBot, ChatGPT-User, ClaudeBot, PerplexityBot, Google-Extended, Applebot-Extended, Amazonbot. These directly power the AI search engines that users query today. Blocking any of these has an immediate, measurable impact on your AI search visibility.

Tier 2 — training and secondary crawlers: FacebookBot, CCBot, anthropic-ai, Bytespider, Googlebot, Bingbot. These feed AI training pipelines and secondary platforms. Blocking them has less immediate impact, but contributes to long-term AI model awareness of your brand.

The blanket block problem

The most common crawler issue we find is a wildcard block that accidentally catches all AI crawlers. It typically looks like this in robots.txt:

User-agent: *
Disallow: /

This was a reasonable pattern a decade ago, when the goal was to prevent content scraping. Today, it silently blocks every AI search platform. A site with this rule will score 0 on Tier 1 crawler access — invisible to ChatGPT, Perplexity, Gemini, and Claude simultaneously.

The fix is to add explicit Allow: / rules for each AI crawler above the wildcard block. Robots.txt applies the most specific matching rule — specific crawler rules always override the wildcard.

Why llms.txt and ai.txt matter

Beyond access control, the crawlers engine checks whether you've added files that actively help AI models understand your site:

  • llms.txt — a Markdown summary of your site designed specifically for LLMs, listing your most important pages with brief descriptions
  • llms-full.txt — an extended version with the full text of key pages, useful for documentation-heavy sites
  • ai.txt — an alternative format supported by some crawlers that includes usage rights and content preferences

These files don't guarantee citations, but they signal that your site is AI-ready — and they give crawlers a clear map of what's worth reading.

How the crawlers score is calculated

The score combines four components:

  • Tier 1 crawler access (weighted heavily — these are the AI search crawlers)
  • Tier 2 crawler access
  • Absence of blanket blocks
  • AI file presence (llms.txt, sitemap)

A perfect crawlers score requires all Tier 1 crawlers unblocked, no blanket block, and at least an llms.txt and sitemap in place.

Check your crawler access score

The free GEO Auditor scan includes a full crawler access breakdown — showing the status of all 14 AI crawlers, whether you have a blanket block, and which AI files are present. It takes about 45 seconds and requires no signup.


Check your site

See how visible your site is to AI search

Free GEO score across all 6 AI platforms. No signup. Results in 45 seconds.

Run a free audit →