Anthony D'Onofrio

What's the weakest part of AI agent security right now?

by

Curious what the PH community thinks. There's a lot of noise about LLM safety, but "safety" and "security" get conflated constantly, and the actual attack surface on an agent in production is its own thing.

From where I sit, the most under-addressed failure modes seem to be:

  • Indirect prompt injection via retrieved content (RAG sources, tool outputs, even user-uploaded docs)

  • Tool/function abuse where the agent happily calls something it shouldn't

  • Trust boundaries between the agent and the systems it can touch

But I'm curious what builders and security folks here are actually running into. If you're shipping anything with an LLM agent in the loop: what's the thing that keeps you up at night? And for anyone who's tried to pentest one, what surprised you?

3 views

Add a comment

Replies

Be the first to comment