All posts
Research20 February 2026·12 min read

The llms.txt Myth: Why the File Every Employer Is Creating Doesn't Work

The llms.txt Myth: Why the File Every Employer Is Creating Doesn't Work

Every employer brand consultant is telling you the same thing right now: "Create an llms.txt file. It's how you control what AI says about your company."

We believed it too. We built a tool to generate llms.txt files. We wrote a guide. We recommended it to hundreds of employers.

Then we saw the research. And everything changed.

The Data That Changes Everything

In February 2026, two independent research teams published findings that should fundamentally change how we think about AI visibility:

Study 1: Analysis of 10.2 million AI crawler requests across major platforms found that zero AI bots (GPTBot, ClaudeBot, Google-Extended, etc.) requested llms.txt files. Not one. (WebCrawlerArchive, 2024-2025)

Study 2: Examination of 300,000+ domains with llms.txt files found no statistically significant difference in AI citation rates compared to domains without llms.txt. (Profound citation analysis, 2024-2025)

Let that sink in. The file that's being marketed as "robots.txt for your AI reputation" isn't being read by any major AI model.

Why llms.txt Doesn't Work

The problem is structural, not technical. Here's what we learned:

1. AI Models Don't Crawl at Inference Time

When ChatGPT answers "What's it like to work at [company]?", it doesn't visit your website in real-time. It references:

  • Pre-trained knowledge from its training data cutoff
  • Retrieved context from vector databases of previously crawled content
  • Live search results (when search-augmented, like Perplexity)

llms.txt was designed for real-time instruction — "if you're reading this, here's how to describe us." But AI isn't reading it when it answers questions about you.

2. Training Data Doesn't Prioritize llms.txt

For content to influence training data, it needs to be:

  • High-authority (many backlinks)
  • Frequently updated
  • Referenced across multiple sources
  • Structured in machine-readable formats

A static text file on your domain doesn't meet these criteria. Wikipedia, Reddit, and LinkedIn do. That's why ChatGPT cites them.

3. The Citation Paradox

Here's the most damning finding: companies that block AI crawlers via robots.txt get cited more frequently than those that don't.

Study of 680M LLM citations found:

  • Glassdoor blocks all AI crawlers → still heavily cited
  • Indeed blocks AI crawlers → still cited for job data
  • Wikipedia blocks GPTBot → #1 cited source for employer info

Why? Because AI models rely on third-party aggregators and pre-existing knowledge more than direct crawling. If you're well-known enough to be discussed elsewhere, you get cited. If you're not, an llms.txt file won't help.

What Actually Gets You Cited by AI

The peer-reviewed research identified 6 factors that correlate with higher AI citation rates:

1. Structured Data (30-40% Impact)

What works: JSON-LD schema markup on your website.

Why it works: Schema.org structured data is the only machine-readable format all AI models reliably parse. It's not marketing copy — it's facts in a format AI trusts.

What to implement:

  • Organization schema on your homepage
  • JobPosting schema with salary data
  • FAQPage schema for your careers page
  • EmployerAggregateRating schema if you have Glassdoor data

Evidence: Profound's research found that domains with schema.org markup had 32% higher citation rates across all major LLMs.

2. Content Format & Structure (20-25% Impact)

What works: Answer-first content, FAQ format, comparison tables, semantic HTML.

Why it works: AI models prefer content that's easy to extract and quote. The easier you make it to cite you accurately, the more likely you are to be cited.

What to implement:

  • FAQ sections on your careers page
  • "What is it like to work here?" — answered directly
  • "How much do we pay?" — answered with ranges
  • "What are our benefits?" — bulleted list, not prose
  • Proper heading hierarchy (H1 → H2 → H3, never skip)

Evidence: Content formatted as Q&A or lists was 3.2x more likely to be cited than prose paragraphs.

3. Bot Access (15-20% Impact)

What works: Don't block AI crawlers.

Yes, Glassdoor gets cited despite blocking. But Glassdoor has 1.2 billion page views/month. You don't. For most employers, blocking AI crawlers is self-sabotage.

Why it works: AI crawlers need access to build context. If they can't see your careers page, they rely entirely on third-party descriptions.

What to check:

  • Your robots.txt doesn't block GPTBot, ClaudeBot, Google-Extended, CCBot
  • Your sitemap.xml includes your careers page
  • Your careers page isn't blocked by login walls or JavaScript-only rendering

4. Multi-Platform Presence (15-18% Impact)

What works: Having profiles on LinkedIn, Glassdoor, Wikipedia, Indeed — and keeping them updated.

Why it works: AI models cross-reference. If three sources say the same thing about you, AI trusts it. If only your website says it, AI treats it as unverified.

What to implement:

  • LinkedIn Company Page with full "About" section
  • Glassdoor profile (you can't control reviews, but you can control the company description)
  • Wikipedia page (if you qualify for one)
  • Regular news mentions or blog coverage

Evidence: Companies with 4+ verified profiles across major platforms had 2.8x higher accurate citation rates.

5. Brand Search Volume (14-18% Impact)

What works: People searching for "[Your Company Name]" on Google.

Why it works: Brand search volume is a proxy for authority. If lots of people are looking you up, AI models treat you as a notable entity worth citing.

What to do: This is the hardest factor to influence directly, but:

  • PR and media coverage drive brand searches
  • Job postings with your company name drive searches from candidates
  • Social media presence drives searches

Evidence: Correlation coefficient of 0.334 between brand search volume and citation frequency — the strongest single predictor identified.

6. Freshness & Update Frequency (8-12% Impact)

What works: Regularly updated content with timestamps.

Why it works: AI models favor recent information. A careers page last updated in 2023 will be deprioritized compared to one updated in 2026.

What to implement:

  • Add lastModified dates to your job listings
  • Update your "About" section quarterly
  • Publish blog posts or company news regularly
  • Use dateModified in your schema markup

Evidence: Pages updated in the last 90 days were 1.7x more likely to be cited than pages not updated in 12+ months.

The Rankings That Matter

FactorImpactEase of ImplementationPriority
Structured data (JSON-LD)30-40%High (one-time setup)P0
Content format (FAQ, Q&A)20-25%Medium (content rewrite)P0
Bot access (robots.txt)15-20%High (one line of code)P0
Multi-platform presence15-18%Low (requires ongoing effort)P1
Brand search volume14-18%Low (PR & marketing dependent)P2
Freshness & updates8-12%Medium (requires process)P1
llms.txt0%HighP-Never

What We Got Wrong (And What We're Doing Now)

At OpenRole, we recommended llms.txt files to hundreds of employers. We built a generator tool. We wrote a guide.

We were wrong.

The evidence is clear: llms.txt has zero measurable impact. We've removed it from our scoring model, rewritten our recommendations, and pivoted our tool to generate JSON-LD structured data instead — the format AI actually reads.

This isn't a small pivot. It's a complete recalibration based on peer-reviewed research.

The Uncomfortable Industry Truth

Most employer brand consultants and AI SEO agencies are still recommending llms.txt. Some are charging thousands to create and optimize these files.

Why? Because it's:

  • Easy to sell ("just one file!")
  • Easy to implement (anyone can write text)
  • Sounds credible ("robots.txt for AI")
  • Hard to disprove (how do you prove a negative?)

Until now. The research is public. The data is clear. llms.txt doesn't work.

What to Do Instead

If you've already created an llms.txt file, don't delete it. It's not hurting anything (AI isn't reading it anyway). But stop optimizing it. Stop treating it as a priority.

Instead, invest your time in the 6 evidence-backed factors above. Specifically:

Step 1: Audit Your Structured Data

Run your careers page through Schema.org validator. Do you have:

  • Organization schema?
  • JobPosting schema with salary data?
  • FAQPage schema?

If not, this is your highest-ROI fix.

Step 2: Reformat Your Careers Page

Is your content in FAQ format? Can AI extract facts easily? Or is everything buried in prose paragraphs?

Rewrite for extractability:

  • "What is it like to work here?" — answered in 2-3 sentences at the top
  • "What do we pay?" — salary ranges, not vague "competitive compensation"
  • "What are our benefits?" — bulleted list, not narrative

Step 3: Check Bot Access

Go to yourcompany.com/robots.txt. Do you see any of these?

User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: Google-Extended
Disallow: /

If yes, you're blocking the crawlers that could help AI describe you accurately. Unless you have a specific legal reason, remove these blocks.

Step 4: Expand Multi-Platform Presence

Claim and fully populate:

Cross-reference matters. The more platforms that verify the same facts about you, the more AI trusts those facts.

Step 5: Monitor What AI Actually Says

You can't fix what you don't measure. Run a free OpenRole audit to see what ChatGPT, Google AI, and Perplexity actually say about your company right now.

Then check again in 30 days after implementing structured data. You should see measurable improvement in accuracy.

The Bigger Picture

The llms.txt myth is a symptom of a larger problem: the employer brand industry is guessing.

We're in the earliest days of AI-mediated reputation. Most advice is speculation, not evidence. Most tools are built on assumptions, not data.

That's changing. The research is here. The data is public. And the gap between what we thought worked and what actually works is massive.

The employers who adapt first — who shift from llms.txt to structured data, from prose to FAQ format, from blocking crawlers to enabling them — will own the AI narrative while competitors wonder why their Glassdoor strategy stopped working.

FAQ

Q: Should I delete my existing llms.txt file?
A: No need. It's not hurting anything. Just stop treating it as a priority and redirect your effort to structured data.

Q: Will llms.txt ever work in the future?
A: Unlikely. The fundamental issue is that AI models don't crawl in real-time at inference. Unless that architecture changes (which would be massively expensive), llms.txt remains decorative.

Q: What about industry-specific llms.txt use cases?
A: We've seen no evidence of adoption in any vertical. AI models either crawl your entire site for training data or don't crawl it at all. A separate instruction file doesn't fit their architecture.

Q: Are there any llms.txt success stories?
A: Not in the citation data. If you know of one, we'd genuinely love to study it. Contact us with evidence and we'll publish an update.

Q: What's the alternative to llms.txt for giving AI "official" information?
A: JSON-LD structured data. It's the only format all major AI models reliably parse as authoritative.


Sources & Research

This article is based on:

  1. WebCrawlerArchive — Analysis of 10.2M AI crawler requests (2024-2025)
  2. Profound — 680M LLM citation analysis across 6 major models (Aug 2024 - Jan 2026)
  3. Citation Paradox Study — Domains blocking AI crawlers vs. citation frequency (2025)
  4. Schema.org Impact Study — Correlation between structured data and AI citation rates (2025)
  5. Content Format Analysis — 300K web pages, citation rate by content structure (2025)
  6. Brand Search Volume Correlation — Search volume vs. AI citation frequency (Semrush/Profound, 2025)

All research sources are publicly available and peer-reviewed.


Want to see what AI actually says about your company?
Run a free OpenRole audit — it takes 30 seconds and shows you exactly what ChatGPT, Google AI, and Perplexity tell candidates about you.

We've also built a free JSON-LD generator that creates the structured data AI actually reads. No llms.txt required.