Vet - Keep your coding agents honest
by•
Vet is a fast and local code review tool open-sourced by the Imbue team. It’s concise where others are verbose, and it catches more relevant issues.
Vet verifies your coding agent's work by considering your conversation history to ensure the agent's actions align with your requests. It catches the silent failures: features half-implemented, tests claimed but never run.
It reviews full PRs too, like logic errors, unhandled edge cases, and deviations from stated goals.



Replies
Imbue
Super interesting! We'll try it out for our vibecoding platform at matterhorn.so!
Imbue
@abhinavramesh let us know what you think! You’re welcome to also share feedback, raise an issue, or help contribute to the open-source project: https://github.com/imbue-ai/vet
🙌
This is the missing piece in the AI coding workflow. We all got comfortable letting agents write code, but verifying what they produce is still mostly manual eyeballing. Love that it's open source too - makes it way easier to trust and customize for different codebases. What's the performance overhead like on larger repos?
Imbue
@emad_ibrahim Thanks for the kind words! In general, Vet becomes slower and more expensive up to a point when running against larger diffs and codebases, this point being the context window for the model being used. The upper bound for the expense and time is quite low. I would expect it to take at most 15 seconds in the base configuration on the largest of diffs and codebases, increasing with the use agentic identifiers.
The 'catches silent failures' angle is what gets me — half-implemented features and tests that were claimed but never actually run are exactly the kind of things that slip through normal code review because reviewers trust that the agent did what it said. How does it handle situations where the conversation history is ambiguous, or the original request was vague to begin with?
Imbue
@zerodarkhub really happy to hear that the positioning of Vet landed. It's one of the cores of what Vet is built around. The encouraging (and honest) answer is: you should give it a try!
Vet is so easy to spin up, try out, and fit into any workflow. One line to install: https://github.com/imbue-ai/vet
Eager to hear about your experience. 🙌
Imbue
@zerodarkhub Since Vet has access to the entire conversation history it can disambiguate to the extent the human has specified expectations (not just the initial request). If the human didn't give sufficient specification, Vet will try to understand the intent of the user's request and will then evaluate the changes made by the agent according to the intent.
Imbue
@andrewlaack it has been great watching you iterate through all of the different iterations of Vet to get it to here. Congrats on the public launch!
Imbue
@andrewlaack @thisisehsan +1 !
Curious how Vet handles the audit trail when an agent makes changes across multiple repos do you log at the diff level or capture the full agent reasoning chain too? Trying to figure out where the boundary between "agent decision" and "human accountability" sits in your model.
Imbue
@avinash_matrixgard Vet will run with the current conversation and the diff within in a specific git repo. This defaults to the git repo in the CWD, but if an agent is making changes across repos the agent can run Vet against any singular repo (doesn't support multi-repo diffs yet).
@andrewlaack That scoping makes sense as a starting point per-repo isolation is actually the safer default anyway, since multi-repo diffs tend to obscure accountability fast. The interesting edge case will be when agents start making coordinated changes across repos (shared library update that cascades into 3 downstream services, for example). Curious whether you're thinking about that as a Vet problem to solve or something that stays in the orchestration layer above it?
Imbue
@avinash_matrixgard I'm interested in seeing what the future of multi-repo changes looks like because I haven't been able to use coding agents for tasks that span across multiple repos. As such, I haven't thought too much about this problem so right no I'd say that would be the responsibility of a higher level orchestration agent to manage, but I could see allowing multi-repo specification, similar to how it works with sub-modules, and then requiring a goal to be specified for Vet, coming from the orchestration agent based on the agent's interpretation of the work that must get done.
@andrewlaack That makes sense as a division of responsibility. The submodule analogy is interesting the tricky part there is that the orchestration agent needs enough context about each repo's architecture to set a meaningful goal for Vet, which starts to feel like it needs its own understanding of cross-repo dependencies before it can even frame the task correctly.
Curious whether you've thought about how Vet handles cases where the agent's
interpretation of "done" diverges from what the codebase actually needs like when the goal is technically achieved but introduces subtle regressions across service boundaries. Is that caught at the Vet level or does that fall back to the orchestration layer to verify?
I tried this with Clawdbot and it successfully caught a 'silent failure' where the agent skipped a test. I can't get it to call Vet everytime though. Do you know how can I prompt it to always call Vet before reporting a task as complete?
Imbue
@cj_studio This was an issue I saw early on when using Vet as a skill in OpenCode. I found the wording in the skill (call Vet after logical groups of changes) sufficed for coding, but you might want to make it stronger by saying something like, "Run Vet at the end of each turn", or you could use OpenClaw hooks to codify this to enforce running Vet at session end.
Told
The silent failures framing is sharp — half-implemented features and unclaimed value are the real activation killers in most B2B products, not churn from explicit dissatisfaction. Curious how Vet surfaces these gaps: is it correlating usage data against the expected activation path, or more of a qualitative signal from user sessions? The distinction matters because one tells you what's broken and the other tells you why. Would be interested in how this fits into a team's existing analytics stack — does it layer on top of tools like Mixpanel or Amplitude, or replace them for the activation layer?
Imbue
@jscanzi thanks for the comment and support! Vet is a code review tool for developers, not a product analytics layer. It reads your coding agent's conversation history alongside the diff to catch bugs, issues, false positives, etc. No usage data or activation funnels or integrations with tools like Mixpanel or Amplitude. Vet lives in a developer's workflow.
Maybe I misunderstood your question, though. Mind restating?
Told
@mrtibbets Thank you for your feedback Alexander!
Imbue
@jscanzi you are very welcome. 🙌
@mrtibbets interesting problem to tackle.
One of the biggest trust issues with coding agents is exactly this. Silent assumptions and fake outputs that look correct on the surface.
I like the idea of verifying the agent against the conversation history.
However, I'm curious about two things.
How does Vet handle long threads with multiple iterations of instructions?
And do you see this becoming a layer that sits between the developer and the agent permanently, almost like an AI code auditor?
Imbue
@taimur_haider1 glad to hear conversation history angle resonates. On long threads: you should give it a try and push Vet to see how it performs through complicated conversations. We'll be excited to hear if it fits your coding and agent behaviors.
On the second question: yes, that's a potential direction for Vet and how many of our team uses it internally. Less a one-time review tool, more an always-on auditor. Hope you find value in Vet like we do, too!
@mrtibbets Appreciate the answer, Alexander.
The ‘always-on auditor’ direction makes sense, especially as agent workflows get longer and harder to track mentally.
One thing I’ve seen with AI dev tools is that trust often breaks at the explanation layer, not just the output layer. When an agent shows the reasoning trail clearly, adoption tends to increase a lot.
Curious, are you planning to surface reasoning paths inside Vet as well, or keep it focused purely on verification?