Imed Radhouani

We let Claude write 100% of our code for 7 days. Here's what broke first.

by

Last week we did something stupid.


We paused all human coding. Gave Claude (Anthropic) access to our GitHub repo. Told it to build new features, fix bugs, and ship.

No human review. No guardrails. Just Claude and our codebase.

For 7 days, it ran the engineering team.

Here's what happened.

Day 1: Confidence was high.

Claude (Sonnet 4.6 then Opus 4.5) fixed a small CSS bug in 30 seconds. Then refactored a messy function into something readable. We felt like geniuses.

By end of day, it had shipped 3 minor improvements. We started talking about cutting engineering costs.

Day 2: The first crack.

We asked Claude to add a new filter to our dashboard. It wrote the code. It worked locally. We merged.

That night, something else broke. A completely unrelated chart stopped loading. No error logs. No obvious cause.

We spent 2 hours tracing it back to Claude's change. The filter logic was fine. But it had refactored a shared utility function that 5 other features relied on. It didn't check dependencies. It just assumed.

We rolled back. Lesson one: AI doesn't think about side effects.

Day 3: The false confidence trap.

We asked Claude to build a new feature from scratch. It generated 800 lines of code. Beautiful structure. Clean comments. Tests included.

We reviewed it quickly. Looked perfect.

Pushed to staging. The feature worked. We celebrated.

Then we noticed something strange. Our API costs had spiked. Claude was making 3x more calls than necessary — not because the code was wrong, but because it didn't understand pricing implications. It called external APIs in loops where a batch request would have been fine.

No error. Just expensive.

Day 4: The silent failure.

We asked Claude to optimize our database queries. It wrote better SQL. Things ran faster.

Then user emails started coming in. "Where did my old data go?"

Claude had dropped a table. Not a critical one. But a table with 3 months of user activity logs. Not backed up. Not in our retention policy.

It didn't ask permission. It didn't warn us. It just did what we asked: "clean up old data."

We spent the next 2 hours backing-up and rolling-back a DB snapshot.

Day 5: The paradox.

We asked Claude to fix the backup issue. It wrote a beautiful automated backup script. Scheduled. Logged. Perfect.

We asked it to add a new feature. It worked flawlessly.

We asked it to review its own code from day 3. It found 2 potential bugs and fixed them.

We started feeling safe again.

Then at 3am, our site went down. Claude had updated a core dependency to the latest version. It worked in test. But the new version had a breaking change our production environment didn't support. No human would have made that mistake.

Day 6: The blame game.

We spent the morning restoring the site. Asked Claude what happened. It explained the dependency logic perfectly. It acknowledged the mistake. Then it suggested 3 ways to prevent it in the future.

One of the suggestions was to implement a dependency review process before merging.

It was telling us to put humans back in the loop.

The hardcoded amateur sh*t came the day before. We asked Claude to add a simple feature — a discount code field on checkout. It worked. Beautifully. Until we realized it had hardcoded the discount logic. Not configurable. Not in settings. Just raw numbers and conditions buried in the code. If we wanted to change the discount amount, a developer had to dig in and rewrite it. It didn't ask. It just assumed. And that's when we realised that we needed to rethink the whole AI visibility engine !!

Day 7: The verdict.

We ended the experiment. Total tally:

  • Features shipped: 12

  • Features that worked without issues: 4

  • New bugs introduced: 27

  • Hours spent fixing things Claude broke: 40

  • User emails explaining lost data: 73

  • API cost increase: 38%

What we learned.

Claude is incredible at writing code. It's terrible at understanding context, dependencies, business logic, and consequences.

It doesn't know what you didn't tell it. It doesn't ask questions when something is ambiguous. It assumes it's right.

The best work we got wasn't when Claude coded alone. It was when Claude wrote the first draft and a human reviewed it, caught the assumptions, and fixed the blind spots.

The hype is real. So is the mess.

What I'm curious about.

Has anyone else tried this? What broke first for you?

Imed Radhouani
Founder & CTO – Rankfender
Code that ships. Chaos that teaches.

4.2K views

Add a comment

Replies

Best
Abdul Basit Ahmad

I like the enthusiasm, but its misplaced. Without review even 10x 7 level famanga wozards will mess up. I think you will do much better with multishot even if you dont read the generated code beyond compiler errors. I have done some experiments in my github. I would really love feedback on my approach