AI Sitemap – LLMs.txt
What is an AI Sitemap (LLMs.txt)
An AI Sitemap, often referred to as LLMs.txt, is a proposed standard similar to robots.txt, but it’s specifically designed for Large Language Models (LLMs)and AI crawlers. The idea is to give website owners control over how their content is accessed and used by AI models for training, inference, or indexing purposes.
📄 What is LLMs.txt?
It’s a plain text file placed at the root of a website (e.g., https://cbstoryhub.com/LLMs.txt) to tell AI systems:
- What parts of the site they can or cannot crawl.
- How they can use the data (e.g., for training, summarizing, or indexing).
- Which specific AI agents or LLMs the rules apply to.
🧠 Why It Matters
As AI models grow in power and data appetite, website owners want more granular control over their content. This is especially important for:
- Privacy and compliance (e.g., GDPR).
- Content ownership concerns.
- Avoiding misuse of proprietary or sensitive data.
🧾 Example LLMs.txt
User-Agent: GPTBot
Disallow: /private-data/
Allow: /
User-Agent: ClaudeBot
Disallow: /
User-Agent: *
Disallow: /no-ai/
- GPTBot (used by OpenAI) can access everything except /private-data/.
- ClaudeBot (Anthropic’s crawler) is fully disallowed.
- All other bots must avoid /no-ai/.
🔍 Comparison to robots.txt
Feature | robots.txt | LLMs.txt |
Purpose | Web search crawling | AI/LLM content usage |
Focus | Indexing for search | Training/inference |
Bot Examples | Googlebot, Bingbot | GPTBot, ClaudeBot |
Enforceable? | Not legally, but respected | Same; depends on bot ethics |
An example to illustrate
The AI Bouncer Your Website Deserves
Thank you for reading and sharing!
Source OpenAI’s ChatGPT Language Models and Dalle
Invest in your future & learn
Learn affiliate marketing & build your own website.
Heads up! Make sure you sign up using my referral link to get access to my personal coaching and all features.
👉 Sign Up