Claro - Research Agents - Claro runs the AI agents that operate on your data
byβ’
Today, we're opening the first public module of Claro: Research Agents.
10+ task-specific agents run inside a native table - enrichment, PDF extraction, URL scraping, classification, location lists, dedupe.
Every cell comes back with a confidence score, citations, and ranked sources.
Built validation-first, not a chat wrapper.
200 free credits on signup, no card.



Replies
Claro - Research Agents
Hey Product Hunt π I'm Matteo, co-founder of Claro.
We built Claro because we kept losing trust in our own AI outputs. @tameesh and I spent years at Delivery Hero and Yelp trying to wrangle product and supplier data at scale. Tameesh then did NLP/LLM research and published at MIT Review. Every time we plugged an LLM into a data workflow, we got beautiful-looking answers with no way to know which ones were right.
Claro is the AI execution layer for product and supplier data. It takes messy supplier feeds, spreadsheets, and documents, turns them into trusted catalog entities with stable IDs, and keeps them validated and correct over time.
Today we're opening the first public module: Research Agents. 14 task-specific agents that run inside a native table:
π Find your perfect list β describe ideal prospects, suppliers, or partners, get an enriched dataset π Build a location list β draw an area on a map, generate POIs with attributes π Capture tables from PDFs β pull tables from reports, pick which to keep π Turn documents into structured data β extract fields across hundreds of docs at once π Scrape data from URLs β turn a list of links into structured rows π Analyse CSV β upload a spreadsheet, validate and enrich it π Merge & Map β dedupe and reconcile records across two files and more.
What makes Claro different is what happens to every answer before it hits your table:
β Multi-model consensus β we push extractions to multiple models in parallel and only accept where they agree
β LLM-as-judge filtering β a lightweight quality gate catches nonsensical outputs before they reach you
β Source ranking β first-party and authoritative sources above random blog posts, datasheets above application guides
β Confidence scores β every cell scored at row, column, and entity level based on source reliability, cross-source consistency, model certainty, and retrieval quality
β Full citations β click any cell to see the exact passage, URL, or document section it came from
β BM25 + AI hybrid matching β for entity resolution against your existing catalog, with human approval loops for borderline cases
It's not a chat wrapper. It's validation-first infrastructure with a spreadsheet you can use today.
Three things I'd love today:
Try it β 200 free credits, no card.
Break it β throw your ugliest CSV or weirdest PDF, tell me what it got wrong (and what the confidence score said)
Tell me which agent you'd use first β we're deciding what to deepen next
I'm pitching live at The Pitch Berlin today, so I'll be bouncing between stage and comments, but I'll reply to every one before EOD.
If you run a marketplace, distributor, or multi-supplier catalog and the underlying platform sounds relevant, DM me β we'd love to talk.
Thanks for taking a look π β Matteo
Claro - Research Agents
thanks @rajiv_ayyangar for the support as Hunter!
Claro - Research Agents
Hello, Tameesh here, co-founder & CTO π
Happy to go deep on anything technical β how we compute confidence scores, how source ranking weights first-party vs open-web sources, how we orchestrate tool calls across extract/classify/generate/search, multi-model consensus, entity resolution with BM25 + embeddings, graph-driven validation, what happens when the model is uncertain.
Short version of what Iβm proud of: the confidence score isnβt a vibes check. Source reliability 40%, cross-source consistency 30%, model certainty from output variance 20%, retrieval quality 10%. We run multi-model consensus for high-stakes enrichments and LLM-as-judge as a lightweight filter on top. You can sort a 100k-row dataset by confidence and review only the reds. Thatβs the difference between βAI outputβ and βAI output you can use.β
Drop your hardest technical question below π
The confidence score + citations on every cell is the interesting part. How does that work across the PDF extraction agent specifically? PDFs can be messy β scanned docs, weird formatting, tables within tables. Does the confidence score drop noticeably on low-quality source material, or does it stay falsely high?
Claro - Research Agents
@sounak_bhattacharyaΒ Honest answer: yes, confidence drops on messy PDFs, and thatβs by design.
Scanned docs go through OCR first, and OCR confidence feeds directly into the cell-level score.
Blurry scan = lower score before extraction even starts. Nested tables and merged cells sometimes produce partial extractions, which pulls the retrieval quality signal down.
The failure mode we guard against is exactly what you described : falsely high confidence on bad sources.
Two things help: cross-source consistency (if two PDFs disagree on the same product, both scores drop) and single-source conservatism (messy doc + one source = weβd rather show a yellow badge than a false green).
If you have a tricky PDF you want to throw at it, DM me : genuinely want to see where it breaks.
The confidence score breakdown is solid, the 30% weight on cross-source consistency is often what's missing in tools like this that over-rely on model certainty. Tameesh on multi-model consensus, how do you handle cases where models agree but are wrong together (shared training bias)? Is the LLM-as-judge filter running on a model from a different family, or is something else catching it? Also curious on your default BM25/embeddings ratio for entity resolution. We ended up at 60/40 BM25-dominant on a similar catalog matching use case. Good launch.
Claro - Research Agents
@julien_zammitΒ Great questions Julien.
On shared training bias: the LLM-as-judge filter deliberately runs on a different model family than the extraction models. But honestly the source hierarchy does more work here : if all models agree AND the answer is grounded in a real source passage, correlated hallucination risk is low. When models agree but the source-to-query semantic match is weak, the confidence score drops even with consensus. Thatβs the retrieval quality signal (the 10% weight) acting as backstop.
On BM25/embeddings: weβre at roughly 70/30 BM25-dominant for structured catalog matching, close to your 60/40. BM25 is hard to beat when field names are standardized. Embeddings add the most when matching across languages or divergent supplier naming conventions.
Would love to hear more about your setup, DM open.
jared.so
Multi-model consensus + source ranking + confidence-per-cell is the validation-first posture every AI-data tool should adopt β chat wrappers produce confident garbage. LLM-as-judge on a different model family to avoid correlated bias is a thoughtful detail. Curious how Claro handles the cost blow-up when running everything through multi-model consensus at catalog scale.
Claro - Research Agents
@mcarmonasΒ Totally fair question - running full multi-model consensus everywhere would blow up costs fast.
At Claro we keep it selective: start with a cheaper model, and only escalate to multi-model + judge when thereβs low confidence or disagreement. If models agree early with solid sources, we stop there.
Good retrieval + caching also cut a lot of repeat work.
So itβs more βstrongly grounded confident answers and consensus only where it actually changes the answer.β