All activity
Aman Kaushikleft a comment
I'm a bit skeptical of how this works, because to keep it real most LLMs like claude code can be prompt engineered to avoid code prone to security issues. Which can be done via RL environments as well. Also your approach isn't updating the model itself which just wastes tokens, you're just writing tests for code which frankly every software project already does 🤷

