Agent-Blackbox: git blame for AI agents – find who broke prod in 3s

I built Agent-Blackbox after losing a week debugging an 8-agent pipeline failure. Every agent pointed fingers at the next one, and logs were useless for tracing the actual decision chain.

What it does:

- 📍 Traces failures to the exact agent and decision point

- 🔗 Reconstructs the full task-based-on chain across agents

- ✅ Uses cryptographically signed receipts (JEP / JAC) so evidence isn't just "trust me bro"

How it works:

Four verbs—`J` (Judge), `D` (Delegate), `T` (Terminate), `V` (Verify)—model any accountability workflow. Every action produces a signed receipt. When something breaks, you follow the `task_based_on` chain back to the failure.

Stack:

- Rust core + Python SDK (TypeScript coming)

- IETF drafts: JEP & JAC

- MIT licensed, fully open source

If you've ever wasted hours grep'ing through agent logs or playing detective across a multi-agent pipeline, I'd love to hear how you currently handle it. Also open to feedback on the JEP/JAC approach—does cryptographic accountability matter in your stack?

GitHub: https://github.com/hjs-spec/Agent-Blackbox

2 views

Agent-Blackbox: git blame for AI agents – find who broke prod in 3s

Replies