We build AI that works for humans

Start new thread

Vet - Keep your coding agents honest

Imbue

•2mo ago

Vet is a fast and local code review tool open-sourced by the Imbue team. It’s concise where others are verbose, and it catches more relevant issues. Vet verifies your coding agent's work by considering your conversation history to ensure the agent's actions align with your requests. It catches the silent failures: features half-implemented, tests claimed but never run. It reviews full PRs too, like logic errors, unhandled edge cases, and deviations from stated goals.

Replies

Best

Imbue

Maker

📌

👋 Hey Product Hunt! We're open-sourcing Vet: a fast and local code review tool built for developers using AI coding agents. A common problem: when you're using an agent to write code, it can hit a wall and silently swap in fake data instead of telling you. You ask it to write tests, it tells you they pass, but it never ran them. You may not notice until later, or at all. Vet verifies your coding agent's work by considering your conversation history to ensure the agent's actions align with your requests. It catches logic errors, unhandled edge cases, and deviations from stated goals with high precision. Vet uses your existing API keys, works with local models, and has zero telemetry. Run from the CLI, CI, or as an agent skill. Eager to answer questions!

Report

2mo ago

@mrtibbets How does Vet analyze or verify AI-generated code? Does it run automated tests, use static code analysis, or compare outputs with expected patterns?

Report

2mo ago

This is the missing piece in the AI coding workflow. We all got comfortable letting agents write code, but verifying what they produce is still mostly manual eyeballing. Love that it's open source too - makes it way easier to trust and customize for different codebases. What's the performance overhead like on larger repos?

Report

2mo ago

Imbue

Maker

@emad_ibrahim Thanks for the kind words! In general, Vet becomes slower and more expensive up to a point when running against larger diffs and codebases, this point being the context window for the model being used. The upper bound for the expense and time is quite low. I would expect it to take at most 15 seconds in the base configuration on the largest of diffs and codebases, increasing with the use agentic identifiers.

Report

2mo ago

The 'catches silent failures' angle is what gets me — half-implemented features and tests that were claimed but never actually run are exactly the kind of things that slip through normal code review because reviewers trust that the agent did what it said. How does it handle situations where the conversation history is ambiguous, or the original request was vague to begin with?

Report

2mo ago

Imbue

Maker

@zerodarkhub really happy to hear that the positioning of Vet landed. It's one of the cores of what Vet is built around. The encouraging (and honest) answer is: you should give it a try!

Vet is so easy to spin up, try out, and fit into any workflow. One line to install: https://github.com/imbue-ai/vet

Eager to hear about your experience. 🙌

Report

2mo ago

Imbue

Maker

@zerodarkhub Since Vet has access to the entire conversation history it can disambiguate to the extent the human has specified expectations (not just the initial request). If the human didn't give sufficient specification, Vet will try to understand the intent of the user's request and will then evaluate the changes made by the agent according to the intent.

Report

2mo ago

Imbue

Maker

@andrewlaack it has been great watching you iterate through all of the different iterations of Vet to get it to here. Congrats on the public launch!

Report

2mo ago

Imbue

Maker

@andrewlaack @thisisehsan +1 !

Report

2mo ago

I tried this with Clawdbot and it successfully caught a 'silent failure' where the agent skipped a test. I can't get it to call Vet everytime though. Do you know how can I prompt it to always call Vet before reporting a task as complete?

Report

2mo ago

Imbue

Maker

@cj_studio This was an issue I saw early on when using Vet as a skill in OpenCode. I found the wording in the skill (call Vet after logical groups of changes) sufficed for coding, but you might want to make it stronger by saying something like, "Run Vet at the end of each turn", or you could use OpenClaw hooks to codify this to enforce running Vet at session end.

Report

2mo ago

Curious how Vet handles the audit trail when an agent makes changes across multiple repos do you log at the diff level or capture the full agent reasoning chain too? Trying to figure out where the boundary between "agent decision" and "human accountability" sits in your model.

Report

2mo ago

Imbue

Maker

@avinash_matrixgard Vet will run with the current conversation and the diff within in a specific git repo. This defaults to the git repo in the CWD, but if an agent is making changes across repos the agent can run Vet against any singular repo (doesn't support multi-repo diffs yet).

Report

2mo ago

@andrewlaack That scoping makes sense as a starting point per-repo isolation is actually the safer default anyway, since multi-repo diffs tend to obscure accountability fast. The interesting edge case will be when agents start making coordinated changes across repos (shared library update that cascades into 3 downstream services, for example). Curious whether you're thinking about that as a Vet problem to solve or something that stays in the orchestration layer above it?

Report

2mo ago

Imbue

Maker

@avinash_matrixgard I'm interested in seeing what the future of multi-repo changes looks like because I haven't been able to use coding agents for tasks that span across multiple repos. As such, I haven't thought too much about this problem so right no I'd say that would be the responsibility of a higher level orchestration agent to manage, but I could see allowing multi-repo specification, similar to how it works with sub-modules, and then requiring a goal to be specified for Vet, coming from the orchestration agent based on the agent's interpretation of the work that must get done.

Report

2mo ago

@andrewlaack That makes sense as a division of responsibility. The submodule analogy is interesting the tricky part there is that the orchestration agent needs enough context about each repo's architecture to set a meaningful goal for Vet, which starts to feel like it needs its own understanding of cross-repo dependencies before it can even frame the task correctly.

Curious whether you've thought about how Vet handles cases where the agent's
interpretation of "done" diverges from what the codebase actually needs like when the goal is technically achieved but introduces subtle regressions across service boundaries. Is that caught at the Vet level or does that fall back to the orchestration layer to verify?

Report

2mo ago

One thing we've noticed while building around AI agents is that the “silent failure” problem gets worse once agents start orchestrating multiple tools or models.

Curious if you’ve seen cases where the agent’s reasoning is technically consistent with the conversation history but still wrong because the underlying model behaved unexpectedly?

Report

2mo ago

Told

The silent failures framing is sharp — half-implemented features and unclaimed value are the real activation killers in most B2B products, not churn from explicit dissatisfaction. Curious how Vet surfaces these gaps: is it correlating usage data against the expected activation path, or more of a qualitative signal from user sessions? The distinction matters because one tells you what's broken and the other tells you why. Would be interested in how this fits into a team's existing analytics stack — does it layer on top of tools like Mixpanel or Amplitude, or replace them for the activation layer?

Report

2mo ago

Imbue

Maker

@jscanzi thanks for the comment and support! Vet is a code review tool for developers, not a product analytics layer. It reads your coding agent's conversation history alongside the diff to catch bugs, issues, false positives, etc. No usage data or activation funnels or integrations with tools like Mixpanel or Amplitude. Vet lives in a developer's workflow.

Maybe I misunderstood your question, though. Mind restating?

Report

2mo ago

Told

@mrtibbets Thank you for your feedback Alexander!

Report

2mo ago

Imbue

Maker

@jscanzi you are very welcome. 🙌

Report

2mo ago

@mrtibbets interesting problem to tackle.

One of the biggest trust issues with coding agents is exactly this. Silent assumptions and fake outputs that look correct on the surface.

I like the idea of verifying the agent against the conversation history.

However, I'm curious about two things.

How does Vet handle long threads with multiple iterations of instructions?

And do you see this becoming a layer that sits between the developer and the agent permanently, almost like an AI code auditor?

Report

2mo ago

Imbue

Maker

@taimur_haider1 glad to hear conversation history angle resonates. On long threads: you should give it a try and push Vet to see how it performs through complicated conversations. We'll be excited to hear if it fits your coding and agent behaviors.

On the second question: yes, that's a potential direction for Vet and how many of our team uses it internally. Less a one-time review tool, more an always-on auditor. Hope you find value in Vet like we do, too!

Report

2mo ago

@mrtibbets Appreciate the answer, Alexander.

The ‘always-on auditor’ direction makes sense, especially as agent workflows get longer and harder to track mentally.

One thing I’ve seen with AI dev tools is that trust often breaks at the explanation layer, not just the output layer. When an agent shows the reasoning trail clearly, adoption tends to increase a lot.

Curious, are you planning to surface reasoning paths inside Vet as well, or keep it focused purely on verification?

Report

2mo ago