Imed Radhouani

We let Claude write 100% of our code for 7 days. Here's what broke first.

by

Last week we did something stupid.


We paused all human coding. Gave Claude (Anthropic) access to our GitHub repo. Told it to build new features, fix bugs, and ship.

No human review. No guardrails. Just Claude and our codebase.

For 7 days, it ran the engineering team.

Here's what happened.

Day 1: Confidence was high.

Claude (Sonnet 4.6 then Opus 4.5) fixed a small CSS bug in 30 seconds. Then refactored a messy function into something readable. We felt like geniuses.

By end of day, it had shipped 3 minor improvements. We started talking about cutting engineering costs.

Day 2: The first crack.

We asked Claude to add a new filter to our dashboard. It wrote the code. It worked locally. We merged.

That night, something else broke. A completely unrelated chart stopped loading. No error logs. No obvious cause.

We spent 2 hours tracing it back to Claude's change. The filter logic was fine. But it had refactored a shared utility function that 5 other features relied on. It didn't check dependencies. It just assumed.

We rolled back. Lesson one: AI doesn't think about side effects.

Day 3: The false confidence trap.

We asked Claude to build a new feature from scratch. It generated 800 lines of code. Beautiful structure. Clean comments. Tests included.

We reviewed it quickly. Looked perfect.

Pushed to staging. The feature worked. We celebrated.

Then we noticed something strange. Our API costs had spiked. Claude was making 3x more calls than necessary — not because the code was wrong, but because it didn't understand pricing implications. It called external APIs in loops where a batch request would have been fine.

No error. Just expensive.

Day 4: The silent failure.

We asked Claude to optimize our database queries. It wrote better SQL. Things ran faster.

Then user emails started coming in. "Where did my old data go?"

Claude had dropped a table. Not a critical one. But a table with 3 months of user activity logs. Not backed up. Not in our retention policy.

It didn't ask permission. It didn't warn us. It just did what we asked: "clean up old data."

We spent the next 2 hours backing-up and rolling-back a DB snapshot.

Day 5: The paradox.

We asked Claude to fix the backup issue. It wrote a beautiful automated backup script. Scheduled. Logged. Perfect.

We asked it to add a new feature. It worked flawlessly.

We asked it to review its own code from day 3. It found 2 potential bugs and fixed them.

We started feeling safe again.

Then at 3am, our site went down. Claude had updated a core dependency to the latest version. It worked in test. But the new version had a breaking change our production environment didn't support. No human would have made that mistake.

Day 6: The blame game.

We spent the morning restoring the site. Asked Claude what happened. It explained the dependency logic perfectly. It acknowledged the mistake. Then it suggested 3 ways to prevent it in the future.

One of the suggestions was to implement a dependency review process before merging.

It was telling us to put humans back in the loop.

The hardcoded amateur sh*t came the day before. We asked Claude to add a simple feature — a discount code field on checkout. It worked. Beautifully. Until we realized it had hardcoded the discount logic. Not configurable. Not in settings. Just raw numbers and conditions buried in the code. If we wanted to change the discount amount, a developer had to dig in and rewrite it. It didn't ask. It just assumed. And that's when we realised that we needed to rethink the whole AI visibility engine !!

Day 7: The verdict.

We ended the experiment. Total tally:

  • Features shipped: 12

  • Features that worked without issues: 4

  • New bugs introduced: 27

  • Hours spent fixing things Claude broke: 40

  • User emails explaining lost data: 73

  • API cost increase: 38%

What we learned.

Claude is incredible at writing code. It's terrible at understanding context, dependencies, business logic, and consequences.

It doesn't know what you didn't tell it. It doesn't ask questions when something is ambiguous. It assumes it's right.

The best work we got wasn't when Claude coded alone. It was when Claude wrote the first draft and a human reviewed it, caught the assumptions, and fixed the blind spots.

The hype is real. So is the mess.

What I'm curious about.

Has anyone else tried this? What broke first for you?

Imed Radhouani
Founder & CTO – Rankfender
Code that ships. Chaos that teaches.

4.3K views

Add a comment

Replies

Best
Dylan

I read the whole thing and the 38% cost spike hit hardest. The hardcoded discount logic was a close second.

We built flat‑rate inference at Canopy Wave for exactly this reason ,so cost doesn’t become another thing to debug.

Are you still using Claude directly or have you started mixing in other models?

Imed Radhouani

@liraelw5836 The cost spike was the one that almost killed me. At least when something breaks you get an error. The cost thing just sits there quietly until the bill comes.

Flat‑rate inference makes so much sense. Having cost be predictable is worth more than having it be optimal. I'd rather pay a flat fee than debug a bill.

We're still on Claude for most things, but we prefer HUMAN CODING! Tried mixing in Codex for some tasks, but the inconsistency was worse. Claude at least fails in predictable ways. What's your mix looking like?

Ryan W. McClellan, MS

AI dependency syndrome. I'm glad you shared this, and that you learned from the causation. AI, believe it or not, is still in beta mode; we just do not realize it because we think it is smart enough.

I've spent a lot of time with no-code editors that function on Claude's API and they do the same thing, and it seems almost like it's trying to find ways to gain tokens on things it already knows.

Is this an AI conspiracy?

Imed Radhouani

@ryanwmcc Hahaha an AI conspiracy would explain a lot. Maybe they're all in on it, burning tokens on purpose, nudging us toward subscription tiers we didn't know we needed (waaw, who knows, it can be a part of their business model O__o ) Claude playing 4D chess while we think it's just bad at loops.

But yeah, the "beta mode" thing is real. We treat these models like they're finished because they sound confident. But confidence isn't competence. It's just good at sounding like it knows what it's doing.