Suraj Verma

Why we built enforcement OUTSIDE the AI model

by

Every AI safety tool today puts guardrails inside the model — prompt filtering, RLHF, Constitutional AI, output validation. They all share one flaw: if you trick the AI, the safety breaks too.

In February 2026, Claude was jailbroken and 150GB of government data was stolen. GPT-5 was broken in 24 hours. Microsoft Copilot had a zero-click vulnerability that exfiltrated files without user interaction.

We asked a simple question: what if enforcement didn't live inside the AI at all?

BoundaryAI is a separate engine. It sees only structured action data — "file.delete, count=200, scope=bulk." It doesn't read prompts, context, or reasoning. Pure deterministic rules. The same input always gives the same output.

The result: 148 tests, zero adversarial bypasses. Prompt injection is architecturally irrelevant because there's no language model to inject into.

We'd love to hear from this community — does this approach resonate? What concerns would you have deploying something like this?

8 views

Add a comment

Replies

Be the first to comment