What llms.txt is

A single markdown file at the root of your site. Curated. Short. Built specifically for LLMs and AI search engines that need to ingest a site without crawling 200 pages first. The proposal landed at llmstxt.org in late 2024 and got picked up fast by Anthropic-trained models, Perplexity, and a handful of agentic search products.

The whole thing fits in a single fetch. That's the point.

What we put in ours

agentutility.ai/llms.txt has six sections:

  • a one-sentence pitch
  • a "what this is" paragraph
  • a list of machine-readable indexes (registry.json, endpoints.txt, sitemap.xml, .well-known/agent.json)
  • docs links
  • cluster summaries with counts and price ranges
  • the full endpoint roster, grouped by cluster

That last block is the thing routers actually use. When an LLM is trying to figure out "is there a paid endpoint that does X", it doesn't want to scrape 244 individual pages. It wants one file with 244 link-and-blurb lines. Each line is enough to decide whether to fetch the detail page.

Brevity vs completeness

We publish two files. /llms.txt is the index. /llms-full.txt is the same thing with the full description, input schema, and output schema for every endpoint inlined. About 30x larger.

Why both? Different consumers want different things. A search router asking "what does agentutility do" wants the short one. A coding agent asking "fetch the full catalog so I can route to the right tool" wants the long one. Publishing both lets each consumer pick.

A trick we learned: the short llms.txt should fit comfortably under 50KB. Above that, models start to truncate it on ingestion. Our short version is currently 18KB. The long one is 540KB and gets fetched maybe 10x less often, but when it matters it really matters.

What goes in vs what stays out

In:

  • canonical URLs, not relative paths (a downstream LLM might fetch this file out of context)
  • prices, when they're stable
  • counts and totals
  • explicit cross-links to the other machine-readable surfaces

Out:

  • marketing prose
  • screenshots, embeds, anything that isn't text
  • timestamps that change every minute (the file should be cacheable)
  • anything user-generated

The temptation is to dump everything in. Don't. An LLM ingesting your file is paying for every token. Earn each one.

Whether it actually works

Hard to measure directly. We don't see crawler-level fetch logs from most of the consumers that matter. What we do see: a ~3x jump in citation rate from Perplexity and ChatGPT browsing once we put /llms.txt up, holding everything else constant. Pure coincidence is possible. The bigger signal is that when our endpoint pages get cited, the cited language tracks the llms.txt summaries word-for-word more often than it tracks the page bodies. That's downstream evidence the file is the ingestion path.

How to roll one out

If you're a paid-API provider:

# minimum viable llms.txt
cat > public/llms.txt << 'EOF'
# your-product

> One-sentence description.

## what this is

(One paragraph.)

## endpoints

- [endpoint-1](https://your-site/endpoint-1): one-line description
- [endpoint-2](https://your-site/endpoint-2): one-line description
EOF

Ship it. Watch the citation traffic for a month. Iterate. The first version is rarely the right shape, but the cost of iterating is one file edit, not a site redesign.

What we'd add next

A version field. Right now there's no way for a consumer to tell whether the file changed without re-fetching and diffing. A simple Version: 2026-05-06 line at the top would solve it. Open question for the llmstxt.org maintainers — we'll send a note.