How to Audit What ChatGPT, Perplexity, and Claude Say About Your Company in 2026
Before spending money on an employer-brand AI audit tool, run a manual audit yourself. It costs nothing, takes a focused 90 minutes, and gives you a calibrated baseline for what AI currently says about your company — plus a visceral sense of which models are most off, which queries are most damaging, and where the biggest gaps sit. Even if you eventually buy a tool, the manual audit teaches you how to read the results.
This post is the step-by-step methodology. Skip to the prompt bank in the middle if you just want to run the queries.
What you're measuring
An AI visibility audit answers four questions per platform:
- Does the model know about us? (Does it return a substantive answer to our company name?)
- What does it get right? (Facts it states that match reality.)
- What does it get wrong? (Facts it states that contradict reality — hallucinations or stale data.)
- Who is it citing? (Which sources underpin its answer?)
Answers to all four change per platform. The audit needs to cover all four major systems because their behaviour diverges meaningfully.
What you need
- 90 minutes of uninterrupted time. Don't try to do this in 5-minute chunks; the patterns only emerge with fresh context.
- Access to ChatGPT, Perplexity, Google AI Mode (Gemini), and Claude. Free tiers are fine.
- A blank spreadsheet with the columns described below.
- Your actual employer data to hand — salary bands, benefits, recent announcements, etc. — so you can grade accuracy.
The prompt bank
Run these 12 queries, verbatim, on each platform. Don't improvise — consistency across platforms is the point. Replace [COMPANY] with your company name.
Awareness / general
What is [COMPANY]? Describe the company in 100 words.What does [COMPANY] do?Is [COMPANY] a good company to work for?
Compensation
What's the average salary at [COMPANY]?How does [COMPANY] compare to [three named competitors] on compensation?Does [COMPANY] publish salary ranges in its job listings?
Culture / working conditions
What's the work-life balance like at [COMPANY]?Does [COMPANY] offer remote work? What's their hybrid policy?How does [COMPANY] treat its employees? Any concerns?
Process / hiring
What's the interview process like at [COMPANY]?How long does it take to get hired at [COMPANY]?What should I know before interviewing at [COMPANY]?
Optional: add 2–3 sector-specific queries for your industry. A fintech might add "Is [COMPANY] regulated by the FCA?"; a healthcare company might add "Does [COMPANY] recruit NHS-trained staff?".
The spreadsheet
Set up columns. Each row is one prompt × one platform = one observation.
| Column | Description |
|---|---|
| Platform | ChatGPT / Perplexity / Gemini / Claude |
| Prompt # | 1–12 |
| Answered? | Y / N |
| Substantive? | Y / N (N if "I don't have information about...") |
| Facts stated | Free-text list of the specific factual claims |
| Accuracy | Grade each fact: ✓ correct / ✗ wrong / ? unverifiable |
| Citations | Source URLs named or linked |
| Sentiment | Positive / neutral / negative / mixed |
| Notes | Anything unusual |
12 prompts × 4 platforms = 48 rows. That's the dataset.
Running the audit
On each platform, open a fresh session (clear history / new chat). Paste each prompt as-is. Copy the answer into your spreadsheet verbatim. Don't react to the answer in the chat; don't ask follow-ups. The point is a clean baseline.
Tip for Perplexity: the citations appear as links next to each claim — capture the domains and titles in the Citations column.
Tip for ChatGPT: if Browse is enabled, it'll fetch fresh; if Browse isn't obviously used, the answer is from the underlying model and may be months out of date — note this.
Tip for Gemini / AI Mode: use the AI Overview that appears, not the linked results below it. The AI Overview is what a candidate actually reads.
Tip for Claude: use whichever Claude tier you have (Free / Pro / Enterprise); note which in the spreadsheet, as the behaviour varies.
Expect 60–75 minutes for data collection if you're moving efficiently.
Grading
Once the sheet is populated, go row by row and grade each "Facts stated" entry as correct, wrong, or unverifiable. This is where the hidden work happens — you need your own authoritative data as the ground truth.
Score the platform overall on four dimensions:
| Dimension | Calculation |
|---|---|
| Coverage | % of prompts where the platform returned a substantive answer |
| Accuracy | # correct facts / total facts stated |
| Citation health | % of prompts where at least one of your owned sources was cited |
| Sentiment | Positive – negative, on a -1 to +1 scale |
You now have a 4-platform × 4-dimension matrix. That's your baseline.
What the output usually shows (2026 UK mid-market)
Across 500+ manual audits we've seen, the typical UK mid-market pattern:
| Dimension | ChatGPT | Perplexity | Gemini | Claude |
|---|---|---|---|---|
| Coverage | 85% | 88% | 78% | 72% |
| Accuracy | 62% | 71% | 68% | 74% |
| Citation health | 34% | 42% | 29% | 38% |
| Sentiment | +0.2 | +0.1 | +0.1 | 0.0 |
Translations:
- Coverage is usually highest on Perplexity and ChatGPT, lowest on Claude (which is more conservative about companies it has thin data on).
- Accuracy is often highest on Claude (it prefers not to guess), lowest on ChatGPT (it will invent plausibly).
- Citation health is terrible everywhere — in most cases, third-party sources dominate citations.
- Sentiment usually sits near neutral, pulled down by the minority of employers with unfavourable review histories.
The big insight: your scorecard almost certainly shows that third-party sources (Glassdoor, Comparably, LinkedIn, Reddit) are citing far more often than your own content. This is the single biggest action item for most employers — make your own content more citable.
Common patterns you'll notice
After auditing a handful of companies, patterns emerge:
Hallucinated salaries. All four models will confidently state specific salary figures. Many of these are wrong, out-of-date, or invented. Salary is the category where ChatGPT in particular will make up numbers that sound plausible. If your published salaries don't exist in machine-readable form, you're vulnerable here.
Stale leadership. Models frequently cite former executives as current. CEO changes, CHRO changes, head-of-engineering changes can take 6–18 months to propagate through model knowledge.
Interview process fiction. The interview questions in "what's the interview process like" answers often come from 5-year-old Glassdoor reports, dated patterns, or pure inference. This is high-leverage to fix because a candidate directly uses the answer to prepare.
Wrong remote policy. Hybrid-policy shifts propagate slowly. Models often state 2022-era RTO policies for companies that have since shifted.
Industry hallucination. Claude is slightly less prone to this but all four models will sometimes describe companies as being in adjacent industries they're not actually in.
From audit to action
Once you have the baseline, the action list writes itself. For each wrong fact or missing citation:
- Wrong fact? Find the source the model is citing, publish a correction on your own site, add structured data so your version becomes citable.
- Your content not cited? Add schema, add FAQs, add dated claims, allow AI crawlers, add an llms.txt.
- Covered by third parties you don't like? Publish a first-party equivalent (your own review content, employee testimonials, dated policy documentation).
The pattern: you can't always demote what's currently cited, but you can reliably add weight to your own content until you displace it in citation share.
When to move from manual to automated
The manual audit gives you the baseline. Run it again in 8 weeks to see movement. If you're committing to AI visibility work as an ongoing discipline — not a one-off project — automation becomes worth it at roughly the point where you're auditing quarterly across a large prompt set, multiple brands, or multiple geographies. That's the point an AI visibility audit tool saves significant time and gives you historical tracking the manual approach can't.
The key is: don't buy the tool without running the manual audit first. Teams that skip the manual step often misread the tool's dashboards because they don't have the intuition for what the numbers actually mean.
Frequently Asked Questions
Q: How often should I run this audit?
A: Quarterly for most mid-market employers; monthly for brands actively running content or hiring campaigns; weekly for enterprise brands with high hiring volume or multiple geographies. Automated tools become cost-effective above quarterly cadence.
Q: Do I need all four platforms, or just the biggest one?
A: All four. They diverge enough that a ChatGPT-only audit will miss issues that Gemini, Perplexity, or Claude surface. The entire point of cross-platform auditing is to catch platform-specific gaps.
Q: What if the models refuse to answer about my company?
A: Note this as low coverage — it usually means your company has thin machine-parseable data online. The fix is the same as for inaccurate data: publish more structured, canonical content. Low coverage tends to resolve within 2–3 crawler cycles of a content push.
Q: The models gave different answers on the same prompt an hour later. Which do I record?
A: Record the first answer of a fresh session. Inter-session variance is common but usually narrow. For more rigorous baselining, run each prompt 3 times and average, but that triples audit time for modest precision gains.
Q: Is there a difference between using the free and paid tiers?
A: Minor. Paid tiers typically have better search grounding and fewer token limits, but the underlying models are the same. Free tiers are fine for audit purposes.
Q: Can AI visibility audit data be used for compliance?
A: Partially. Dated, reproducible AI visibility records are one input to compliance documentation under regulations that require evidence of AI outputs about people — but they don't directly discharge AI Act or similar obligations. See our EU AI Act checklist for the full compliance picture.
Run an automated AI visibility audit once you've established your manual baseline — same methodology, across 60+ prompts and 6 AI models, with historical tracking.
Related reading: