Forums
We benchmarked Claude Code refactoring, with and without code health guidance
We ran a benchmark to see how well @Claude Code actually refactors legacy code alone and then redid the same test, but this time with code-health guidance via MCP server.
To limit any vendor bias, we used a public data set of 25,000 source code files from competitive programming, including carefully crafted unit tests.
We assessed agent correctness by running those tests.
We measured the Code Health impact using CodeScene.
(See our research Code for Machines, Not just Humans for more details on the methodology and data)
Claude Code that was MCP-guided achieved 2 5x more more improvements in Code Health compared to unguided refactoring.
Does state of your code determine an AI agent's performance?
Recently our team concluded in our peer-reviewed research that that code health determines AI-performance. The study "Code for Machines, Not Just Humans: Quantifying AI-Friendliness with Code Health Metrics" concluded that when agents operate on unhealthy code, the defect risk increases by 60% (at least).
It s was a large-scale study of 5,000 real programs using six different LLMs to refactor code while keeping all tests
passing.
