All activity
Devon Kelleyleft a comment
The 47% token reduction from tool search is the number that actually matters for anyone building agents. Every token you save on tool calls is compute you can spend on reasoning. In agentic workflows where the model is making dozens of tool calls per task, that compounds fast. The computer use benchmark surpassing human performance is interesting but the real question for production is...

GPT‑5.4OpenAI's most efficient model: less tokens, more clarity
Devon Kelleyleft a comment
The "any website to API" framing sells this short. What you're actually building is the execution layer for browser agents at scale, and that's a much bigger deal. Every team building browser-based agents hits the same wall: the browser is the messiest, most unpredictable environment for agents to operate in, and making that reliable enough for production is genuinely hard. Curious Kitty's...

Anything APIAny website. We deliver the API.
Devon Kelleyleft a comment
Security is one of the few domains where an agent-first approach genuinely makes more sense than a human-first one. Humans reviewing security alerts at scale is already broken. Most teams either drown in false positives or miss real vulnerabilities because the volume is impossible to keep up with. Alan's question about transitive dependencies is the right one. The npm supply chain attacks...

Codex SecurityOur application security agent
Devon Kelleyleft a comment
The proactive angle is what makes this interesting. Every other coding tool waits for you to ask. The shift from "reactive copilot" to "proactive agent that catches problems before you even think to look" is a genuinely different product category. The key question is: how does it learn what "your standards" actually are? Because every team thinks they have standards until an AI starts enforcing...

Enia CodeProactive AI that refines code & learns your standards
Devon Kelleyleft a comment
I vibecoded the initial prototype and then went back and re-engineered the parts that actually matter. That's the middle ground nobody talks about and I think it's the right answer for most technical founders. The thing is, agentic coding is inevitable. The product scales, the economics usually don't. So the question isn't vibecode vs engineer, it's which parts need to survive contact with...
Building SaaS in 2026: Are you vibecoding your own product or engineering it the "old way"?
Aleksej VukomanovicJoin the discussion
Devon Kelleyleft a comment
"The hardest part isn't wiring steps together, it's handling the messy, unpredictable real world." YES. This is the thing most automation tools completely ignore. You can build the prettiest flow graph in the world and it still falls apart the moment a Shopify webhook returns something unexpected or a Slack API rate limits you at the wrong time. The shift from UI design to instruction design is...

Aident AI Beta 2Open-world automations, managed in plain English
Devon Kelleyleft a comment
Jack's point about lossy compression dropping variable names is the real risk here. In agentic workflows the context window isn't just "information the model reads," it's the decision-making surface for every routing and tool call that follows. Compress the wrong thing and you don't just get a bad answer, you get a confidently wrong decision that cascades through the entire execution chain. The...

Context GatewayMake Claude Code faster and cheaper without losing context
Devon Kelleyleft a comment
This is the layer everyone is going to need and nobody is building yet. The stat about finding 150 MCP servers where 50 had destructive production access and nobody on the security team knew they existed is wild but also exactly what I'd expect. Gianmarco's question about multi-agent audit trail attribution is the hard one. When an orchestrator spawns sub-agents that each hit MCP tools,...

GolfEnterprise MCP Control Plane
Devon Kelleyleft a comment
The infrastructure bundling is where the real value is. Every team I've talked to building agent-powered products spends weeks just wiring up streaming, sessions, and billing before they even touch the actual agent logic. Collapsing that into one deploy command is genuinely useful. Jonathan's question below is the right one though. When the abstraction breaks (and it will), devs need to debug...

21st Agents SDKSDK to add an Claude Code AI agent to your app
Devon Kelleyleft a comment
The "AI verifying AI" framing Paul mentioned below is the right one. This is the actual problem nobody talks about honestly: teams are shipping AI-generated code and just crossing their fingers. The MCP approach is smart because it keeps the testing agent inside the same context as the coding agent instead of bolting on some disconnected CI step after the fact. Real question though: when the...

TestSprite 2.1Agentic testing for the AI-native team.
