If you've ever added a rule like Disallow: / to your robots.txt to block scrapers, there's a good chance you've also blocked every major AI search engine — including ChatGPT, Perplexity, Gemini, and Claude — from reading your site.
This is one of the most common and most impactful GEO mistakes. A blocked crawler means no citations, regardless of how good your content is.
How to check if you're blocking AI crawlers
Visit your robots.txt file directly:
https://yourdomain.com/robots.txtLook for any of these patterns that would block AI crawlers:
User-agent: *followed byDisallow: /— blocks everything, including all AI crawlersUser-agent: GPTBotfollowed byDisallow: /— explicitly blocks ChatGPT's crawlerUser-agent: ClaudeBotfollowed byDisallow: /— explicitly blocks Anthropic's crawler
If you're not sure how to read robots.txt syntax, use GEO Auditor's free scan — it checks all 14 AI crawlers against your robots.txt and tells you exactly which ones are blocked.
The complete list of AI crawlers to allow
There are 14 AI user-agents you should be aware of. Here's what each one is used for:
GPTBot— OpenAI's primary training and search crawlerOAI-SearchBot— OpenAI's real-time search crawler (used by ChatGPT search)ChatGPT-User— Browsing requests made by ChatGPT during conversationsClaudeBot— Anthropic's web crawler (Claude)anthropic-ai— Legacy Anthropic crawler identifierPerplexityBot— Perplexity AI's search crawlerGoogle-Extended— Google's crawler for Gemini and AI OverviewsApplebot-Extended— Apple's crawler for Apple IntelligenceAmazonbot— Amazon's AI crawler (Alexa, Rufus)FacebookBot— Meta AI crawlerBytespider— ByteDance AI crawlerCCBot— Common Crawl (used to train many open-source AI models)Googlebot— Standard Google crawler (also feeds AI Overviews)Bingbot— Microsoft's crawler (also feeds Copilot)
The correct robots.txt setup
If you want to allow all AI crawlers while still blocking malicious scrapers, the safest approach is to explicitly allow the crawlers you want, using a specific User-agent rule for each:
# Allow AI search crawlers
User-agent: GPTBot
Allow: /
User-agent: OAI-SearchBot
Allow: /
User-agent: ClaudeBot
Allow: /
User-agent: PerplexityBot
Allow: /
User-agent: Google-Extended
Allow: /
User-agent: Applebot-Extended
Allow: /
# Block generic scrapers
User-agent: *
Disallow: /private/
Disallow: /admin/This pattern is explicit: specific crawlers get full access, while the wildcard rule restricts only the paths you actually want to protect.
What if you used a wildcard Disallow?
A User-agent: * / Disallow: / rule blocks every robot that doesn't have a more specific rule above it. If GPTBot doesn't appear earlier in the file with an Allow: / directive, it is blocked.
The fix: add explicit Allow: / rules for each AI crawler before the wildcard block. Robots.txt is evaluated in order — more specific rules take precedence over the wildcard.
Firewall rules and JavaScript challenges
Robots.txt isn't the only place crawlers get blocked. If you're using Cloudflare or another CDN/WAF, check whether any firewall rules target bots. Common culprits:
- Cloudflare's "Bot Fight Mode" — can intercept legitimate AI crawlers before they reach your server
- JavaScript challenge pages — AI crawlers don't execute JavaScript, so a JS challenge silently blocks them even if robots.txt allows access
- Rate limiting rules targeting high-frequency crawlers — can intermittently block AI crawlers that request pages quickly
The GEO Auditor free scan checks for these patterns as part of the technical health audit.
After you fix it
Once you've updated robots.txt, the change takes effect immediately — there's no cache to clear. AI crawlers typically re-crawl robots.txt before each visit. For ChatGPT specifically, OpenAI's crawler respects robots.txt changes within a few days.
Run a free GEO audit to verify all 14 AI crawlers now show as accessible, and to check whether any other signals are suppressing your AI search visibility.