GEO Robots.txt Generator
Configure AI crawler access for optimal GEO visibility
📚 Educational Tool
This generates the AI crawler section of your robots.txt. Append to your existing file. Always test in staging first.
📖 Key Concepts Explained
Allow — Full access. The crawler can visit all pages (except blocked paths) at any speed. Best for maximum visibility.
Rate-Limit — Restricted speed. The crawler can visit pages but must wait X seconds between requests (Crawl-delay). Use for training crawlers that consume server resources.
Block — No access. The crawler cannot visit any pages. You become invisible to that AI system. Use for aggressive or unwanted crawlers.
Crawl-delay — Seconds the crawler must wait between page requests. Higher values = less server load but slower indexing. Typical: 2-10s for search crawlers, 10-30s for training crawlers.
User-agent: * — A wildcard rule that applies to ALL crawlers not specifically listed. Acts as a "catch-all" fallback.
Training crawlers — Bulk-collect content to train future AI models. Your content becomes part of model weights but is not directly attributed. High server load, indirect long-term value. robots.txt is your control surface for opting out. Consider rate-limiting.
Search Index crawlers — Proactively crawl your site to build a searchable index, like Googlebot does for Google Search. When users later ask questions, the AI searches this pre-built index. robots.txt is your control surface for search visibility. Prioritize allowing these.
User Fetcher (RAG) crawlers — Fire only when a human user asks a question, fetching your page in real-time to augment that specific response. This is Retrieval-Augmented Generation. robots.txt compliance varies by company — some honor it (Claude-User), some don't (ChatGPT-User). Allow, but don't rely on robots.txt for blocking.
1. Select Your Strategy
🚀
Maximize Visibility
Allow all crawlers for maximum AI citations
⚖️
Balanced
Allow search index + RAG, rate-limit training
🛡️
Conservative
Search index + RAG only, block training
2. Configure AI Crawlers
Training
Bulk model training — indirect long-term value
Search Index
Proactive indexing for AI search — controllable via robots.txt
User Fetcher (RAG)
Live retrieval at query time — robots.txt may not apply
3. Paths to Block ?
Pages that should NOT be crawled by AI systems (applied to all non-blocked crawlers)
Comma-separated paths. Include trailing slash.
4. Additional Options
Helps crawlers find important pages
Rule for unlisted crawlers
⚠️ Compliance Note
Robots.txt is a voluntary standard — legitimate crawlers respect it, but compliance is not guaranteed. Known gaps: ChatGPT-User does not follow robots.txt as of December 2025 (OpenAI treats it as user-initiated browsing, not an automated crawler). Perplexity-User may ignore robots.txt for user-initiated requests. DeepSeek's real-time fetcher does not identify itself in user-agent strings and is unblockable via robots.txt. Agentic AI browsers (ChatGPT Atlas, Perplexity Comet) use standard Chrome user-agents with no distinguishing token. For these cases, consider IP-level firewall rules or behavioral detection via Cloudflare/CDN. For crisis or sensitive content: assume that once a page is publicly accessible, AI assistants may fetch and quote it immediately.
Generated robots.txt
0 Allowed
0 Rate-Limited
0 Blocked