Romanela Polutak

Romanela Polutak

CodeHealth MCP Server by CodeScene
CMO @CodeScene

Forums

We benchmarked Claude Code refactoring, with and without code health guidance

We ran a benchmark to see how well @Claude Code actually refactors legacy code alone and then redid the same test, but this time with code-health guidance via MCP server.

  • To limit any vendor bias, we used a public data set of 25,000 source code files from competitive programming, including carefully crafted unit tests. 

  • We assessed agent correctness by running those tests. 

  • We measured the Code Health impact using CodeScene.

  • (See our research Code for Machines, Not just Humans for more details on the methodology and data)

Claude Code that was MCP-guided achieved 2 5x more more improvements in Code Health compared to unguided refactoring.

Does state of your code determine an AI agent's performance?

Recently our team concluded in our peer-reviewed research that that code health determines AI-performance. The study "Code for Machines, Not Just Humans: Quantifying AI-Friendliness with Code Health Metrics" concluded that when agents operate on unhealthy code, the defect risk increases by 60% (at least).

It s was a large-scale study of 5,000 real programs using six different LLMs to refactor code while keeping all tests

passing.