Launching today

Benchspan
Run agent benchmarks in minutes, not hours
57 followers
Run agent benchmarks in minutes, not hours
57 followers
BenchSpan is a benchmarking platform for AI agents. Running benchmarks is slow, expensive, and fragile. We fix that. Onboard your agent once (we onboarded Claude Code in 37 lines), run any benchmark in parallel in the cloud, and get every result in one place your whole team can see. When runs fail halfway, rerun just what broke. Compare runs side by side to see exactly where your agent is improving. Stop fighting your benchmarks and start shipping your agent.









Benchspan
@ritesh_malpaniΒ Curious about the rerun-only-failures part. If I'm running something like SWE-bench on a custom agent and 40 out of 500 instances fail due to network issues, does the rerun stitch those results back into the original run automatically, or do I end up with two separate result sets I need to merge?