Free tool

AI Robots Config Helper

Decide which AI crawlers can read your site, with the tradeoffs spelled out. Pick a posture, copy the robots.txt block, done.

robots.txt · paste at your-site.com/robots.txt

# AI crawler rules generated by RANK & RECOMMEND
# Recommended: blocks training-only crawlers, allows live and search bots that cite you.

# ---- Blocked: training-only crawlers ----

# OpenAI: Crawls the public web to train ChatGPT models
User-agent: GPTBot
Disallow: /

# Anthropic: Crawls the public web to train Claude models
User-agent: ClaudeBot
Disallow: /

# Anthropic: Older Anthropic crawler identifier (training)
User-agent: anthropic-ai
Disallow: /

# Google: Controls whether your content trains Gemini and Vertex AI
User-agent: Google-Extended
Disallow: /

# Apple: Trains Apple Intelligence models
User-agent: Applebot-Extended
Disallow: /

# Common Crawl: General crawl used to train many open and closed LLMs
User-agent: CCBot
Disallow: /

# ByteDance: Aggressive crawler used for TikTok / Doubao training
User-agent: Bytespider
Disallow: /

# Meta: Trains Llama and other Meta AI
User-agent: Meta-ExternalAgent
Disallow: /

# Cohere: Crawls for Cohere model training
User-agent: cohere-ai
Disallow: /

# ---- Allowed: live and search bots that can cite you ----

# OpenAI: Live fetch when a ChatGPT user clicks a link in an answer
User-agent: ChatGPT-User
Allow: /

# OpenAI: Indexes pages for ChatGPT Search and inline citations
User-agent: OAI-SearchBot
Allow: /

# Anthropic: Live fetch when a Claude user opens a link in an answer
User-agent: Claude-User
Allow: /

# Anthropic: Indexes pages for Claude's search and citations
User-agent: Claude-SearchBot
Allow: /

# Perplexity: Indexes pages for Perplexity answers and source citations
User-agent: PerplexityBot
Allow: /

# Perplexity: Live fetch when a Perplexity user runs a query
User-agent: Perplexity-User
Allow: /

# Google: Various Google products, including some AI research
User-agent: GoogleOther
Allow: /

# Meta: Indexes pages for Facebook link previews
User-agent: FacebookBot
Allow: /

# DuckDuckGo: Powers DuckDuckGo's AI assistant answers
User-agent: DuckAssistBot
Allow: /

# Diffbot: Knowledge graph crawler used by AI products
User-agent: Diffbot
Allow: /

ALL TRACKED AI CRAWLERS

USER-AGENT	OWNER	PURPOSE	IN THIS CONFIG
GPTBot	OpenAI	Crawls the public web to train ChatGPT models	BLOCKED
ChatGPT-User	OpenAI	Live fetch when a ChatGPT user clicks a link in an answer	ALLOWED
OAI-SearchBot	OpenAI	Indexes pages for ChatGPT Search and inline citations	ALLOWED
ClaudeBot	Anthropic	Crawls the public web to train Claude models	BLOCKED
Claude-User	Anthropic	Live fetch when a Claude user opens a link in an answer	ALLOWED
Claude-SearchBot	Anthropic	Indexes pages for Claude's search and citations	ALLOWED
anthropic-ai	Anthropic	Older Anthropic crawler identifier (training)	BLOCKED
PerplexityBot	Perplexity	Indexes pages for Perplexity answers and source citations	ALLOWED
Perplexity-User	Perplexity	Live fetch when a Perplexity user runs a query	ALLOWED
Google-Extended	Google	Controls whether your content trains Gemini and Vertex AI	BLOCKED
GoogleOther	Google	Various Google products, including some AI research	ALLOWED
Applebot-Extended	Apple	Trains Apple Intelligence models	BLOCKED
CCBot	Common Crawl	General crawl used to train many open and closed LLMs	BLOCKED
Bytespider	ByteDance	Aggressive crawler used for TikTok / Doubao training	BLOCKED
Meta-ExternalAgent	Meta	Trains Llama and other Meta AI	BLOCKED
FacebookBot	Meta	Indexes pages for Facebook link previews	ALLOWED
DuckAssistBot	DuckDuckGo	Powers DuckDuckGo's AI assistant answers	ALLOWED
cohere-ai	Cohere	Crawls for Cohere model training	BLOCKED
Diffbot	Diffbot	Knowledge graph crawler used by AI products	ALLOWED

MONITOR WHO ACTUALLY OBEYS YOUR ROBOTS.TXT

Members get an AI Crawler Visit Log that shows which AI crawlers hit which paths, plus alerts when one ignores your robots.txt. Join the waitlist for the crawler log and the free GEO Playbook.