I had done everything right. Or at least, everything the current playbook says to do. Used AI to fill the gaps, reviewed it myself, then handed it off to more AI.

This past week, I got handed a project that required backend work. Not because I have backend experience – I don’t, really – but because “AI can handle the parts you don’t know.” And honestly, that’s not entirely wrong. Claude Code got me through it. I wrote what I could, leaned on it for what I couldn’t, and when the dust settled, the code was in decent shape. I ran my own review. Frontend looked solid. A few tweaks. Tests passed. Manual walkthrough, nothing glaring.
I pushed the branch, opened the PR, and handed it to Cursor for a second pass. A couple of issues surfaced. Fixed them. Then CodeRabbitAI got its turn. A couple more. Fixed those too. Three layers of review, every issue resolved. I tagged my human colleagues, feeling pretty good about the whole thing.
That’s where the confidence ended.
The Comment That Made Me Feel Two Things at Once
My colleague, a full-stack engineer with real backend experience, left a comment on the PR. They knew going in that I didn’t have much backend context, so they didn’t just tell me what was wrong. Instead, they asked me to try something: open the app in two windows, change profiles in one, make updates in the other, try to update and save, and see what happens.
I did it. The profiles fell out of sync.
It was immediately obvious, once I saw it, that it was the kind of thing where you go “oh, of course” the second it breaks. But I never would have thought to test that scenario on my own. I hadn’t spent years debugging backend state. I didn’t have the mental model for what could go wrong when a user has two sessions running simultaneously and starts mixing state between them. That’s hard-won intuition, built from experience I just don’t have yet. No AI tool flagged it either. Not Claude Code, not Cursor, not CodeRabbitAI. The bug just sat there, quiet, untouched by every layer of “smart” review I’d thrown at it.
What I appreciated most was how my colleague handled it. They didn’t write the fix in the comment. They didn’t tell me the answer. They trusted me to find it once I could see it. That’s a specific kind of mentorship, the kind that respects your ability to learn while acknowledging the gap. I was humbled and genuinely grateful at the same time, which is a weird combination but also kind of the best possible outcome of a code review.
The Myth of the Foolproof Stack
I’m not anti-AI. I use it every day, and I’m not going to pretend otherwise. But I want to be honest about what that PR showed me, because I don’t think the experience is unique to me.
We’ve been sold a version of AI-assisted development where layering more tools means fewer things fall through the cracks. More coverage, fewer blind spots, better outcomes. And when you’re in the middle of it, running your code through three different review passes, watching issues get flagged and resolved, it genuinely feels airtight. The pitch is clean. The reality, as it turns out, is messier.
Here’s the part that got me: CodeRabbit, one of the tools I used on that very PR, called 2025 “the year the internet broke.” Change failure rates up 30%. Incidents per pull request up 23.5%. Their own data, about the exact category of tool I was using, to feel safe. A separate 2026 Sonar survey found that teams using AI coding tools are 46% more likely to experience production incidents than those that don’t. That’s not a marginal difference.
Amazon’s internal AI agent, Kiro, autonomously decided the fix for a minor issue was to delete and rebuild a production environment. Thirteen-hour outage in an AWS. Amazon called it “user error.” Sure, ok. Forrester now predicts that 50% of companies that attributed headcount reductions to AI will quietly rehire for those same roles by 2027. Google’s own numbers show 20% of their 2025 AI engineering hires were former employees they’d previously let go.
That’s not optimization. That’s a very expensive loop.
The Short Stick
I understand AI is getting better. The models are improving, the tooling is maturing, and some of these failure modes will happen less as that continues. I genuinely believe that. I’m not writing this as someone who thinks we should go back to writing everything by hand.
But we’re living in the meantime. And in the meantime, companies are cutting engineers based on anticipated future capabilities, not demonstrated current ones. They’re shipping AI-reviewed code into production, watching things break, and then quietly hiring people back with no acknowledgment that the original bet didn’t pay off. The people who got laid off aren’t getting apologies or reinstatement. They’re getting contractor gigs at lower pay if they’re lucky. The idea that humans are optional is already starting to crack under its own evidence, but the cost of that lesson is being absorbed by the people who can least afford it.
The most valuable thing in that PR wasn’t a tool. It was a colleague who understood what I didn’t know, cared enough to teach me instead of just writing the fix themselves, and trusted me to get there once I could see the problem. That’s not something I can prompt my way to. No model is going to replicate that specific kind of knowledge transfer, built from real experience, offered with actual generosity.
AI caught some bugs on that PR. A human caught the one that mattered.
Leave a Comment