We asked 5 AI models the same 1,000 questions. How often do you think they agreed?
We built a model to generate 1,000 questions that people actually ask.
Not random prompts.
We scraped 50,000 real user queries from search logs, forum threads, and support tickets across 12 industries.
We clustered them by intent and generated 1,000 representative questions.
We asked those same 1,000 questions to 5 AI models: ChatGPT (GPT-4), Gemini (Ultra), Perplexity (Pro), Claude (4.5 Sonnet), and Llama (3).
We ran the experiment daily for 30 days. We tracked every citation at the source level.
The goal: measure citation overlap.
How often do these models cite the same source for the same question?
The dataset:
1,000 questions × 30 days × 5 models = 150,000 total answer sets
47,382 unique citations tracked
12 industries with at least 80 questions each
4,923 distinct domains cited across all models
Overall citation overlap: 23.7%
Across all 5 models, the same source was cited in only 23.7% of answers. That means 76.3% of the time, the models were pulling from different places.
Model Pair | Overlap (%) |
|---|---|
ChatGPT – Perplexity | 41.2% |
ChatGPT – Gemini | 31.8% |
ChatGPT – Claude | 19.4% |
ChatGPT – Llama | 12.3% |
Perplexity – Gemini | 33.1% |
Perplexity – Claude | 21.7% |
Perplexity – Llama | 14.1% |
Gemini – Claude | 12.4% |
Gemini – Llama | 9.8% |
Claude – Llama | 8.2% |
Industry variance:
Overlap varied significantly by industry. Some industries had high consensus. Others were fragmented.
Industry | Overlap (all 5 models) |
|---|---|
Finance | 31.4% |
Healthcare | 29.2% |
SaaS | 26.8% |
Energy | 24.1% |
Manufacturing | 22.3% |
Education | 21.0% |
Real Estate | 19.7% |
E‑commerce | 18.4% |
Media | 16.2% |
Consumer Goods | 14.9% |
Travel | 13.5% |
Professional Services | 12.1% |
Finance and healthcare had the highest overlap. Professional services and travel had the lowest.
Platform stability:
We also measured how often a model cited the same source for the same question across different days.
Model | Consistency (same source day‑to‑day) |
|---|---|
ChatGPT | 68.2% |
Perplexity | 62.4% |
Gemini | 54.1% |
Claude | 41.7% |
Llama | 33.2% |
ChatGPT was the most stable. Llama was the least. If you appeared in ChatGPT today, you had a 68% chance of appearing tomorrow. If you appeared in Llama, you had a 33% chance.
Source type breakdown:
We classified every cited source by type. The distribution varied dramatically by model.
Source Type | ChatGPT | Perplexity | Gemini | Claude | Llama |
|---|---|---|---|---|---|
Comparison pages | 28% | 24% | 12% | 8% | 6% |
FAQs | 22% | 19% | 14% | 11% | 9% |
News/press | 8% | 12% | 31% | 14% | 11% |
Forums (Reddit, etc.) | 6% | 9% | 5% | 34% | 19% |
Wikipedia | 12% | 14% | 21% | 9% | 7% |
Blogs | 14% | 11% | 9% | 13% | 28% |
E‑commerce/reviews | 6% | 7% | 4% | 6% | 12% |
Other | 4% | 4% | 4% | 5% | 8% |
ChatGPT and Perplexity favored structured content (comparisons, FAQs). Gemini favored press and Wikipedia. Claude favored forums and community discussions. Llama favored blogs and reviews. This helped us to fine-tune our AI visibility measurement model and the way we calculate and track a brand.
What we did with this data:
We stopped treating all models the same.
For ChatGPT and Perplexity: we increased structured content production by 340%. Comparison pages went from 12 to 52. FAQs from 8 to 31.
For Gemini: we hired a PR firm to target tech press. Media mentions increased 280% in 90 days.
For Claude: we started participating in 12 relevant subreddits daily. Forum citations increased 170%.
For Llama: we stopped optimizing entirely.
Results after 90 days:
Model | Share of Voice Change |
|---|---|
ChatGPT | +41% |
Perplexity | +34% |
Gemini | +18% |
Claude | +12% |
Llama | -3% |
We grew where it mattered. We stopped wasting time where it didn't.
What we learned:
The models don't agree. They each have their own sources, their own biases, their own citation patterns. Trying to win on all of them with one strategy is like trying to rank for every keyword with one page.
Pick your battles. Optimize for the models that drive your business. Let the rest be noise.
What we're curious about:
Have you noticed the same inconsistencies across models in your space? One platform citing you, another ignoring you? One pulling from Reddit, another from press releases? We're tracking this quarterly and publishing updates. Would love to hear what you're seeing.
Imed Radhouani
Founder & CTO – Rankfender
Evidence-based product development



Replies