The llms.txt Myth: Why the File Every Employer Is Creating Doesn't Work
The llms.txt Myth: Why the File Every Employer Is Creating Doesn't Work
Every employer brand consultant is telling you the same thing right now: "Create an llms.txt file. It's how you control what AI says about your company."
We believed it too. We built a tool to generate llms.txt files. We wrote a guide. We recommended it to hundreds of employers.
Then we saw the research. And everything changed.
The Data That Changes Everything
In February 2026, two independent research teams published findings that should fundamentally change how we think about AI visibility:
Study 1: Analysis of 10.2 million AI crawler requests across major platforms found that zero AI bots (GPTBot, ClaudeBot, Google-Extended, etc.) requested llms.txt files. Not one. (WebCrawlerArchive, 2024-2025)
Study 2: Examination of 300,000+ domains with llms.txt files found no statistically significant difference in AI citation rates compared to domains without llms.txt. (Profound citation analysis, 2024-2025)
Let that sink in. The file that's being marketed as "robots.txt for your AI reputation" isn't being read by any major AI model.
Why llms.txt Doesn't Work
The problem is structural, not technical. Here's what we learned:
1. AI Models Don't Crawl at Inference Time
When ChatGPT answers "What's it like to work at [company]?", it doesn't visit your website in real-time. It references:
- Pre-trained knowledge from its training data cutoff
- Retrieved context from vector databases of previously crawled content
- Live search results (when search-augmented, like Perplexity)
llms.txt was designed for real-time instruction — "if you're reading this, here's how to describe us." But AI isn't reading it when it answers questions about you.
2. Training Data Doesn't Prioritize llms.txt
For content to influence training data, it needs to be:
- High-authority (many backlinks)
- Frequently updated
- Referenced across multiple sources
- Structured in machine-readable formats
A static text file on your domain doesn't meet these criteria. Wikipedia, Reddit, and LinkedIn do. That's why ChatGPT cites them.
3. The Citation Paradox
Here's the most damning finding: companies that block AI crawlers via robots.txt get cited more frequently than those that don't.
Study of 680M LLM citations found:
- Glassdoor blocks all AI crawlers → still heavily cited
- Indeed blocks AI crawlers → still cited for job data
- Wikipedia blocks GPTBot → #1 cited source for employer info
Why? Because AI models rely on third-party aggregators and pre-existing knowledge more than direct crawling. If you're well-known enough to be discussed elsewhere, you get cited. If you're not, an llms.txt file won't help.
What Actually Gets You Cited by AI
The peer-reviewed research identified 6 factors that correlate with higher AI citation rates:
1. Structured Data (30-40% Impact)
What works: JSON-LD schema markup on your website.
Why it works: Schema.org structured data is the only machine-readable format all AI models reliably parse. It's not marketing copy — it's facts in a format AI trusts.
What to implement:
- Organization schema on your homepage
- JobPosting schema with salary data
- FAQPage schema for your careers page
- EmployerAggregateRating schema if you have Glassdoor data
Evidence: Profound's research found that domains with schema.org markup had 32% higher citation rates across all major LLMs.
2. Content Format & Structure (20-25% Impact)
What works: Answer-first content, FAQ format, comparison tables, semantic HTML.
Why it works: AI models prefer content that's easy to extract and quote. The easier you make it to cite you accurately, the more likely you are to be cited.
What to implement:
- FAQ sections on your careers page
- "What is it like to work here?" — answered directly
- "How much do we pay?" — answered with ranges
- "What are our benefits?" — bulleted list, not prose
- Proper heading hierarchy (H1 → H2 → H3, never skip)
Evidence: Content formatted as Q&A or lists was 3.2x more likely to be cited than prose paragraphs.
3. Bot Access (15-20% Impact)
What works: Don't block AI crawlers.
Yes, Glassdoor gets cited despite blocking. But Glassdoor has 1.2 billion page views/month. You don't. For most employers, blocking AI crawlers is self-sabotage.
Why it works: AI crawlers need access to build context. If they can't see your careers page, they rely entirely on third-party descriptions.
What to check:
- Your
robots.txtdoesn't blockGPTBot,ClaudeBot,Google-Extended,CCBot - Your
sitemap.xmlincludes your careers page - Your careers page isn't blocked by login walls or JavaScript-only rendering
4. Multi-Platform Presence (15-18% Impact)
What works: Having profiles on LinkedIn, Glassdoor, Wikipedia, Indeed — and keeping them updated.
Why it works: AI models cross-reference. If three sources say the same thing about you, AI trusts it. If only your website says it, AI treats it as unverified.
What to implement:
- LinkedIn Company Page with full "About" section
- Glassdoor profile (you can't control reviews, but you can control the company description)
- Wikipedia page (if you qualify for one)
- Regular news mentions or blog coverage
Evidence: Companies with 4+ verified profiles across major platforms had 2.8x higher accurate citation rates.
5. Brand Search Volume (14-18% Impact)
What works: People searching for "[Your Company Name]" on Google.
Why it works: Brand search volume is a proxy for authority. If lots of people are looking you up, AI models treat you as a notable entity worth citing.
What to do: This is the hardest factor to influence directly, but:
- PR and media coverage drive brand searches
- Job postings with your company name drive searches from candidates
- Social media presence drives searches
Evidence: Correlation coefficient of 0.334 between brand search volume and citation frequency — the strongest single predictor identified.
6. Freshness & Update Frequency (8-12% Impact)
What works: Regularly updated content with timestamps.
Why it works: AI models favor recent information. A careers page last updated in 2023 will be deprioritized compared to one updated in 2026.
What to implement:
- Add
lastModifieddates to your job listings - Update your "About" section quarterly
- Publish blog posts or company news regularly
- Use
dateModifiedin your schema markup
Evidence: Pages updated in the last 90 days were 1.7x more likely to be cited than pages not updated in 12+ months.
The Rankings That Matter
| Factor | Impact | Ease of Implementation | Priority |
|---|---|---|---|
| Structured data (JSON-LD) | 30-40% | High (one-time setup) | P0 |
| Content format (FAQ, Q&A) | 20-25% | Medium (content rewrite) | P0 |
| Bot access (robots.txt) | 15-20% | High (one line of code) | P0 |
| Multi-platform presence | 15-18% | Low (requires ongoing effort) | P1 |
| Brand search volume | 14-18% | Low (PR & marketing dependent) | P2 |
| Freshness & updates | 8-12% | Medium (requires process) | P1 |
| llms.txt | 0% | High | P-Never |
What We Got Wrong (And What We're Doing Now)
At OpenRole, we recommended llms.txt files to hundreds of employers. We built a generator tool. We wrote a guide.
We were wrong.
The evidence is clear: llms.txt has zero measurable impact. We've removed it from our scoring model, rewritten our recommendations, and pivoted our tool to generate JSON-LD structured data instead — the format AI actually reads.
This isn't a small pivot. It's a complete recalibration based on peer-reviewed research.
The Uncomfortable Industry Truth
Most employer brand consultants and AI SEO agencies are still recommending llms.txt. Some are charging thousands to create and optimize these files.
Why? Because it's:
- Easy to sell ("just one file!")
- Easy to implement (anyone can write text)
- Sounds credible ("robots.txt for AI")
- Hard to disprove (how do you prove a negative?)
Until now. The research is public. The data is clear. llms.txt doesn't work.
What to Do Instead
If you've already created an llms.txt file, don't delete it. It's not hurting anything (AI isn't reading it anyway). But stop optimizing it. Stop treating it as a priority.
Instead, invest your time in the 6 evidence-backed factors above. Specifically:
Step 1: Audit Your Structured Data
Run your careers page through Schema.org validator. Do you have:
- Organization schema?
- JobPosting schema with salary data?
- FAQPage schema?
If not, this is your highest-ROI fix.
Step 2: Reformat Your Careers Page
Is your content in FAQ format? Can AI extract facts easily? Or is everything buried in prose paragraphs?
Rewrite for extractability:
- "What is it like to work here?" — answered in 2-3 sentences at the top
- "What do we pay?" — salary ranges, not vague "competitive compensation"
- "What are our benefits?" — bulleted list, not narrative
Step 3: Check Bot Access
Go to yourcompany.com/robots.txt. Do you see any of these?
User-agent: GPTBot
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: Google-Extended
Disallow: /
If yes, you're blocking the crawlers that could help AI describe you accurately. Unless you have a specific legal reason, remove these blocks.
Step 4: Expand Multi-Platform Presence
Claim and fully populate:
Cross-reference matters. The more platforms that verify the same facts about you, the more AI trusts those facts.
Step 5: Monitor What AI Actually Says
You can't fix what you don't measure. Run a free OpenRole audit to see what ChatGPT, Google AI, and Perplexity actually say about your company right now.
Then check again in 30 days after implementing structured data. You should see measurable improvement in accuracy.
The Bigger Picture
The llms.txt myth is a symptom of a larger problem: the employer brand industry is guessing.
We're in the earliest days of AI-mediated reputation. Most advice is speculation, not evidence. Most tools are built on assumptions, not data.
That's changing. The research is here. The data is public. And the gap between what we thought worked and what actually works is massive.
The employers who adapt first — who shift from llms.txt to structured data, from prose to FAQ format, from blocking crawlers to enabling them — will own the AI narrative while competitors wonder why their Glassdoor strategy stopped working.
FAQ
Q: Should I delete my existing llms.txt file?
A: No need. It's not hurting anything. Just stop treating it as a priority and redirect your effort to structured data.
Q: Will llms.txt ever work in the future?
A: Unlikely. The fundamental issue is that AI models don't crawl in real-time at inference. Unless that architecture changes (which would be massively expensive), llms.txt remains decorative.
Q: What about industry-specific llms.txt use cases?
A: We've seen no evidence of adoption in any vertical. AI models either crawl your entire site for training data or don't crawl it at all. A separate instruction file doesn't fit their architecture.
Q: Are there any llms.txt success stories?
A: Not in the citation data. If you know of one, we'd genuinely love to study it. Contact us with evidence and we'll publish an update.
Q: What's the alternative to llms.txt for giving AI "official" information?
A: JSON-LD structured data. It's the only format all major AI models reliably parse as authoritative.
Sources & Research
This article is based on:
- WebCrawlerArchive — Analysis of 10.2M AI crawler requests (2024-2025)
- Profound — 680M LLM citation analysis across 6 major models (Aug 2024 - Jan 2026)
- Citation Paradox Study — Domains blocking AI crawlers vs. citation frequency (2025)
- Schema.org Impact Study — Correlation between structured data and AI citation rates (2025)
- Content Format Analysis — 300K web pages, citation rate by content structure (2025)
- Brand Search Volume Correlation — Search volume vs. AI citation frequency (Semrush/Profound, 2025)
All research sources are publicly available and peer-reviewed.
Want to see what AI actually says about your company?
Run a free OpenRole audit — it takes 30 seconds and shows you exactly what ChatGPT, Google AI, and Perplexity tell candidates about you.
We've also built a free JSON-LD generator that creates the structured data AI actually reads. No llms.txt required.