Imed Radhouani

We asked 5 AI models the same 1,000 questions. How often do you think they agreed?

We built a model to generate 1,000 questions that people actually ask.

Not random prompts.

We scraped 50,000 real user queries from search logs, forum threads, and support tickets across 12 industries.
We clustered them by intent and generated 1,000 representative questions.

We asked those same 1,000 questions to 5 AI models: ChatGPT (GPT-4), Gemini (Ultra), Perplexity (Pro), Claude (4.5 Sonnet), and Llama (3).
We ran the experiment daily for 30 days. We tracked every citation at the source level.


The goal: measure citation overlap.

How often do these models cite the same source for the same question?

The dataset:

  • 1,000 questions × 30 days × 5 models = 150,000 total answer sets

  • 47,382 unique citations tracked

  • 12 industries with at least 80 questions each

  • 4,923 distinct domains cited across all models

Overall citation overlap: 23.7%

Across all 5 models, the same source was cited in only 23.7% of answers. That means 76.3% of the time, the models were pulling from different places.

Model Pair

Overlap (%)

ChatGPT – Perplexity

41.2%

ChatGPT – Gemini

31.8%

ChatGPT – Claude

19.4%

ChatGPT – Llama

12.3%

Perplexity – Gemini

33.1%

Perplexity – Claude

21.7%

Perplexity – Llama

14.1%

Gemini – Claude

12.4%

Gemini – Llama

9.8%

Claude – Llama

8.2%


Industry variance:

Overlap varied significantly by industry. Some industries had high consensus. Others were fragmented.

Industry

Overlap (all 5 models)

Finance

31.4%

Healthcare

29.2%

SaaS

26.8%

Energy

24.1%

Manufacturing

22.3%

Education

21.0%

Real Estate

19.7%

E‑commerce

18.4%

Media

16.2%

Consumer Goods

14.9%

Travel

13.5%

Professional Services

12.1%

Finance and healthcare had the highest overlap. Professional services and travel had the lowest.


Platform stability:

We also measured how often a model cited the same source for the same question across different days.

Model

Consistency (same source day‑to‑day)

ChatGPT

68.2%

Perplexity

62.4%

Gemini

54.1%

Claude

41.7%

Llama

33.2%

ChatGPT was the most stable. Llama was the least. If you appeared in ChatGPT today, you had a 68% chance of appearing tomorrow. If you appeared in Llama, you had a 33% chance.


Source type breakdown:

We classified every cited source by type. The distribution varied dramatically by model.

Source Type

ChatGPT

Perplexity

Gemini

Claude

Llama

Comparison pages

28%

24%

12%

8%

6%

FAQs

22%

19%

14%

11%

9%

News/press

8%

12%

31%

14%

11%

Forums (Reddit, etc.)

6%

9%

5%

34%

19%

Wikipedia

12%

14%

21%

9%

7%

Blogs

14%

11%

9%

13%

28%

E‑commerce/reviews

6%

7%

4%

6%

12%

Other

4%

4%

4%

5%

8%

ChatGPT and Perplexity favored structured content (comparisons, FAQs). Gemini favored press and Wikipedia. Claude favored forums and community discussions. Llama favored blogs and reviews. This helped us to fine-tune our AI visibility measurement model and the way we calculate and track a brand.


What we did with this data:

We stopped treating all models the same.

  • For ChatGPT and Perplexity: we increased structured content production by 340%. Comparison pages went from 12 to 52. FAQs from 8 to 31.

  • For Gemini: we hired a PR firm to target tech press. Media mentions increased 280% in 90 days.

  • For Claude: we started participating in 12 relevant subreddits daily. Forum citations increased 170%.

  • For Llama: we stopped optimizing entirely.

Results after 90 days:

Model

Share of Voice Change

ChatGPT

+41%

Perplexity

+34%

Gemini

+18%

Claude

+12%

Llama

-3%


We grew where it mattered. We stopped wasting time where it didn't.

What we learned:

The models don't agree. They each have their own sources, their own biases, their own citation patterns. Trying to win on all of them with one strategy is like trying to rank for every keyword with one page.

Pick your battles. Optimize for the models that drive your business. Let the rest be noise.

What we're curious about:

Have you noticed the same inconsistencies across models in your space? One platform citing you, another ignoring you? One pulling from Reddit, another from press releases? We're tracking this quarterly and publishing updates. Would love to hear what you're seeing.

Imed Radhouani
Founder & CTO – Rankfender
Evidence-based product development

47 views

Add a comment

Replies

Be the first to comment