We asked 5 AI models the same 1,000 questions. How often do you think they agreed? : Rankfender Discussion Forums

We built a model to generate 1,000 questions that people actually ask.

Not random prompts.

We scraped 50,000 real user queries from search logs, forum threads, and support tickets across 12 industries.
We clustered them by intent and generated 1,000 representative questions.

We asked those same 1,000 questions to 5 AI models: ChatGPT (GPT-4), Gemini (Ultra), Perplexity (Pro), Claude (4.5 Sonnet), and Llama (3).
We ran the experiment daily for 30 days. We tracked every citation at the source level.

The goal: measure citation overlap.

How often do these models cite the same source for the same question?

The dataset:

1,000 questions × 30 days × 5 models = 150,000 total answer sets
47,382 unique citations tracked
12 industries with at least 80 questions each
4,923 distinct domains cited across all models

Overall citation overlap: 23.7%

Across all 5 models, the same source was cited in only 23.7% of answers. That means 76.3% of the time, the models were pulling from different places.

Model Pair	Overlap (%)
ChatGPT – Perplexity	41.2%
ChatGPT – Gemini	31.8%
ChatGPT – Claude	19.4%
ChatGPT – Llama	12.3%
Perplexity – Gemini	33.1%
Perplexity – Claude	21.7%
Perplexity – Llama	14.1%
Gemini – Claude	12.4%
Gemini – Llama	9.8%
Claude – Llama	8.2%

Industry variance:

Overlap varied significantly by industry. Some industries had high consensus. Others were fragmented.

Industry	Overlap (all 5 models)
Finance	31.4%
Healthcare	29.2%
SaaS	26.8%
Energy	24.1%
Manufacturing	22.3%
Education	21.0%
Real Estate	19.7%
E‑commerce	18.4%
Media	16.2%
Consumer Goods	14.9%
Travel	13.5%
Professional Services	12.1%

Finance and healthcare had the highest overlap. Professional services and travel had the lowest.

Platform stability:

We also measured how often a model cited the same source for the same question across different days.

Model	Consistency (same source day‑to‑day)
ChatGPT	68.2%
Perplexity	62.4%
Gemini	54.1%
Claude	41.7%
Llama	33.2%

ChatGPT was the most stable. Llama was the least. If you appeared in ChatGPT today, you had a 68% chance of appearing tomorrow. If you appeared in Llama, you had a 33% chance.

Source type breakdown:

We classified every cited source by type. The distribution varied dramatically by model.

Source Type	ChatGPT	Perplexity	Gemini	Claude	Llama
Comparison pages	28%	24%	12%	8%	6%
FAQs	22%	19%	14%	11%	9%
News/press	8%	12%	31%	14%	11%
Forums (Reddit, etc.)	6%	9%	5%	34%	19%
Wikipedia	12%	14%	21%	9%	7%
Blogs	14%	11%	9%	13%	28%
E‑commerce/reviews	6%	7%	4%	6%	12%
Other	4%	4%	4%	5%	8%

ChatGPT and Perplexity favored structured content (comparisons, FAQs). Gemini favored press and Wikipedia. Claude favored forums and community discussions. Llama favored blogs and reviews. This helped us to fine-tune our AI visibility measurement model and the way we calculate and track a brand.

What we did with this data:

We stopped treating all models the same.

For ChatGPT and Perplexity: we increased structured content production by 340%. Comparison pages went from 12 to 52. FAQs from 8 to 31.
For Gemini: we hired a PR firm to target tech press. Media mentions increased 280% in 90 days.
For Claude: we started participating in 12 relevant subreddits daily. Forum citations increased 170%.
For Llama: we stopped optimizing entirely.

Results after 90 days:

Model	Share of Voice Change
ChatGPT	+41%
Perplexity	+34%
Gemini	+18%
Claude	+12%
Llama	-3%

We grew where it mattered. We stopped wasting time where it didn't.

What we learned:

The models don't agree. They each have their own sources, their own biases, their own citation patterns. Trying to win on all of them with one strategy is like trying to rank for every keyword with one page.

Pick your battles. Optimize for the models that drive your business. Let the rest be noise.

What we're curious about:

Have you noticed the same inconsistencies across models in your space? One platform citing you, another ignoring you? One pulling from Reddit, another from press releases? We're tracking this quarterly and publishing updates. Would love to hear what you're seeing.

Imed Radhouani
Founder & CTO – Rankfender
Evidence-based product development

We asked 5 AI models the same 1,000 questions. How often do you think they agreed?

Replies