Every major AI search engine operates its own web crawler. Before ChatGPT, Perplexity, or Gemini can cite your site, their crawler has to be able to read it. GEO Auditor's crawler access engine checks all 14 known AI user-agents against your site's access controls — and the results are often surprising.
What the crawlers engine checks
The audit inspects every layer that can block a crawler, not just robots.txt:
- robots.txt rules — per-agent
AllowandDisallowdirectives, plus blanket wildcard blocks - Meta robots tags —
noindex,nofollow, and AI-specific directives likenoaiandnoimageai - AI files — whether
llms.txt,llms-full.txt, andai.txtare present to help AI models understand your site - Sitemap presence — a sitemap helps crawlers discover all your pages, not just the homepage
The 14 AI crawlers we check
Not all AI crawlers are equal. GEO Auditor groups them into two tiers:
Tier 1 — high-impact AI search crawlers: GPTBot, OAI-SearchBot, ChatGPT-User, ClaudeBot, PerplexityBot, Google-Extended, Applebot-Extended, Amazonbot. These directly power the AI search engines that users query today. Blocking any of these has an immediate, measurable impact on your AI search visibility.
Tier 2 — training and secondary crawlers: FacebookBot, CCBot, anthropic-ai, Bytespider, Googlebot, Bingbot. These feed AI training pipelines and secondary platforms. Blocking them has less immediate impact, but contributes to long-term AI model awareness of your brand.
The blanket block problem
The most common crawler issue we find is a wildcard block that accidentally catches all AI crawlers. It typically looks like this in robots.txt:
User-agent: *
Disallow: /This was a reasonable pattern a decade ago, when the goal was to prevent content scraping. Today, it silently blocks every AI search platform. A site with this rule will score 0 on Tier 1 crawler access — invisible to ChatGPT, Perplexity, Gemini, and Claude simultaneously.
The fix is to add explicit Allow: / rules for each AI crawler above the wildcard block. Robots.txt applies the most specific matching rule — specific crawler rules always override the wildcard.
Why llms.txt and ai.txt matter
Beyond access control, the crawlers engine checks whether you've added files that actively help AI models understand your site:
llms.txt— a Markdown summary of your site designed specifically for LLMs, listing your most important pages with brief descriptionsllms-full.txt— an extended version with the full text of key pages, useful for documentation-heavy sitesai.txt— an alternative format supported by some crawlers that includes usage rights and content preferences
These files don't guarantee citations, but they signal that your site is AI-ready — and they give crawlers a clear map of what's worth reading.
How the crawlers score is calculated
The score combines four components:
- Tier 1 crawler access (weighted heavily — these are the AI search crawlers)
- Tier 2 crawler access
- Absence of blanket blocks
- AI file presence (llms.txt, sitemap)
A perfect crawlers score requires all Tier 1 crawlers unblocked, no blanket block, and at least an llms.txt and sitemap in place.
Check your crawler access score
The free GEO Auditor scan includes a full crawler access breakdown — showing the status of all 14 AI crawlers, whether you have a blanket block, and which AI files are present. It takes about 45 seconds and requires no signup.