Keystone - Teach your repo how to run itself
by•
Keystone self-configures a working devcontainer for any git repo, all on its own.
Give it a repo. Get back a Dockerfile, devcontainer.json, and a passing test runner. It runs a coding agent inside a sandboxed Modal environment so your machine is never touched.
It’s open-source, works with Claude Code and Codex, and the dev containers it produces work in VS Code and GitHub Codespaces.
pip install imbue-keystone



Replies
Imbue
@mrtibbets How accurate has Keystone been so far at generating correct Docker configs for complex repos with multiple deps, like ML projects?
Imbue
@mrtibbets @swati_paliwal Lead developer here -- we've spent a fair bit of time tuning Keystone to make it handle a wide range of repositories, from simple Python projects to complex polyglot repos like Scipy, OpenCV, PyTorch, and TensorFlow.
There's definitely variation in agent performance, and this is something we're characterizing carefully and will dive into more in a subsequent research report, so stay tuned!
A short answer to your question: we see Keystone getting a project's tests to pass inside a Docker container 95+% of the time with Claude Opus.
Can I specify my preferences before the process starts, or does it automatically choose the best option?
Imbue
@natalia_iankovych Right now, the Keystone CLI offers some configuration flags to specify which agent is used (Claude, Codex, or OpenCode), and some constraints like a deadline and a maximum inference cost.
We don't currently support preferences related to Dockerfile configuration, but these would be relatively easy to add. Feel free to leave us a feature request here if what we've already got doesn't meet your needs. Pull Requests are also very welcome!
Imbue
@thad_hughes_imbue it has been great to see you evolve this work on container setup over the last year from just something for us, to something anyone can use
Imbue
Proud of you @thad_hughes_imbue for the work shipping this! I've learned so much from using Keystone as a benchmark to understand cost, performance, and failure modes of Claude Code vs. Codex vs. Opencode.