Free tool
AI Robots Config Helper
Decide which AI crawlers can read your site, with the tradeoffs spelled out. Pick a posture, copy the robots.txt block, done.
robots.txt · paste at your-site.com/robots.txt
# AI crawler rules generated by RANK & RECOMMEND # Recommended: blocks training-only crawlers, allows live and search bots that cite you. # ---- Blocked: training-only crawlers ---- # OpenAI: Crawls the public web to train ChatGPT models User-agent: GPTBot Disallow: / # Anthropic: Crawls the public web to train Claude models User-agent: ClaudeBot Disallow: / # Anthropic: Older Anthropic crawler identifier (training) User-agent: anthropic-ai Disallow: / # Google: Controls whether your content trains Gemini and Vertex AI User-agent: Google-Extended Disallow: / # Apple: Trains Apple Intelligence models User-agent: Applebot-Extended Disallow: / # Common Crawl: General crawl used to train many open and closed LLMs User-agent: CCBot Disallow: / # ByteDance: Aggressive crawler used for TikTok / Doubao training User-agent: Bytespider Disallow: / # Meta: Trains Llama and other Meta AI User-agent: Meta-ExternalAgent Disallow: / # Cohere: Crawls for Cohere model training User-agent: cohere-ai Disallow: / # ---- Allowed: live and search bots that can cite you ---- # OpenAI: Live fetch when a ChatGPT user clicks a link in an answer User-agent: ChatGPT-User Allow: / # OpenAI: Indexes pages for ChatGPT Search and inline citations User-agent: OAI-SearchBot Allow: / # Anthropic: Live fetch when a Claude user opens a link in an answer User-agent: Claude-User Allow: / # Anthropic: Indexes pages for Claude's search and citations User-agent: Claude-SearchBot Allow: / # Perplexity: Indexes pages for Perplexity answers and source citations User-agent: PerplexityBot Allow: / # Perplexity: Live fetch when a Perplexity user runs a query User-agent: Perplexity-User Allow: / # Google: Various Google products, including some AI research User-agent: GoogleOther Allow: / # Meta: Indexes pages for Facebook link previews User-agent: FacebookBot Allow: / # DuckDuckGo: Powers DuckDuckGo's AI assistant answers User-agent: DuckAssistBot Allow: / # Diffbot: Knowledge graph crawler used by AI products User-agent: Diffbot Allow: /
ALL TRACKED AI CRAWLERS
| USER-AGENT | OWNER | PURPOSE | IN THIS CONFIG |
|---|---|---|---|
| GPTBot | OpenAI | Crawls the public web to train ChatGPT models | BLOCKED |
| ChatGPT-User | OpenAI | Live fetch when a ChatGPT user clicks a link in an answer | ALLOWED |
| OAI-SearchBot | OpenAI | Indexes pages for ChatGPT Search and inline citations | ALLOWED |
| ClaudeBot | Anthropic | Crawls the public web to train Claude models | BLOCKED |
| Claude-User | Anthropic | Live fetch when a Claude user opens a link in an answer | ALLOWED |
| Claude-SearchBot | Anthropic | Indexes pages for Claude's search and citations | ALLOWED |
| anthropic-ai | Anthropic | Older Anthropic crawler identifier (training) | BLOCKED |
| PerplexityBot | Perplexity | Indexes pages for Perplexity answers and source citations | ALLOWED |
| Perplexity-User | Perplexity | Live fetch when a Perplexity user runs a query | ALLOWED |
| Google-Extended | Controls whether your content trains Gemini and Vertex AI | BLOCKED | |
| GoogleOther | Various Google products, including some AI research | ALLOWED | |
| Applebot-Extended | Apple | Trains Apple Intelligence models | BLOCKED |
| CCBot | Common Crawl | General crawl used to train many open and closed LLMs | BLOCKED |
| Bytespider | ByteDance | Aggressive crawler used for TikTok / Doubao training | BLOCKED |
| Meta-ExternalAgent | Meta | Trains Llama and other Meta AI | BLOCKED |
| FacebookBot | Meta | Indexes pages for Facebook link previews | ALLOWED |
| DuckAssistBot | DuckDuckGo | Powers DuckDuckGo's AI assistant answers | ALLOWED |
| cohere-ai | Cohere | Crawls for Cohere model training | BLOCKED |
| Diffbot | Diffbot | Knowledge graph crawler used by AI products | ALLOWED |
MONITOR WHO ACTUALLY OBEYS YOUR ROBOTS.TXT
Members get an AI Crawler Visit Log that shows which AI crawlers hit which paths, plus alerts when one ignores your robots.txt. Join the waitlist for the crawler log and the free GEO Playbook.