I Use AI All Day. I Still Won't Let It Own the Merge.

Everyone's talking about agentic coding in 2026. The charts look great. But if you actually ask engineers what they're willing to hand off end-to-end, the room gets quiet. That gap isn't hypocrisy — it's the whole story.

I Use AI All Day. I Still Won't Let It Own the Merge.

There’s a weird tension in my Slack right now.

Half the messages are screenshots of some agent finishing a refactor in twenty minutes. The other half are senior engineers quietly asking: did anyone actually read that PR?

I’m not immune. I reach for Claude or Cursor constantly — drafts, tests, glue code, the boring stuff. It’s not embarrassment; it’s muscle memory. But there’s still this line I rarely cross: I don’t merge something I haven’t really looked at, and I don’t pretend the model “owns” an outcome the way a teammate would.

That’s not me being precious about craft. It’s me noticing what the data already says if you squint past the marketing.

The part the Twitter thread skips

You’ve seen the same headlines I have. Anthropic’s agentic coding report for 2026 landed and bounced around dev Twitter (and Claude’s companion write-up). Plot twist nobody wants to post as a hot take: usage is everywhere; blind delegation is not.

Which makes sense the second you stop treating teams like monoliths.

If you’ve ever paired with a good junior, you know what trust feels like. You share context. They ask when they’re stuck. They have a reason not to ship garbage — their name is on the line.

Models are incredible at volume and breadth. They don’t get paged. They don’t get embarrassed at the retro. When they’re wrong, they’re often wrong in this calm, confident voice that passes CI until reality hits.

So we do the rational thing: we use AI constantly, but we only fully delegate the slices where checking the work is cheap or the blast radius is tiny. The rest is “you draft, I sign.”

That mismatch — lots of help, little true handoff — is what I’ve started calling the trust gap. Not a moral failing. A signal.

When “just ship it” bites

The failure mode isn’t “we adopted too much AI.” It’s softer than that.

It’s the Friday PR that’s 800 lines because the agent was on a roll. It’s green checks because the tests never covered the edge case. It’s someone saying looks fine because the diff blurred together.

How trust quietly rots

One ugly release can unwind months of cultural goodwill. I’ve watched it happen. The fix isn’t “turn off the robots.” It’s boring: be honest about what you’re delegating and how you verify it.

What I’ve seen work (in real teams, not decks)

Nobody needs another manifesto. Here’s the stuff that actually sticks when I visit companies that aren’t just demoing.

They sort the work before anyone opens a chat window. Some tickets are great for agents: mechanical refactors behind strong types, regenerating boilerplate when golden tests exist, syncing docs. Some are messy-human work: auth paths, cross-service invariants, anything your lawyer would care about. If everything defaults to “let the model try,” your reviewers quietly quit.

They treat long runs like engineering, not vibes. If your agent is going to work for an hour, it needs checkpoints, resumability, a place to surface uncertainty. Anthropic’s write-up on harness design for long-running work puts a name on something good teams were already doing: stop treating this like a long text thread and start treating it like a job with state.

They spend review energy where the risk is. The leverage isn’t “read every insertion.” It’s property tests on parsers, contracts between services, scanners that match your stack, an ADR when the agent proposes a shape shift. Automate the paranoia; keep humans for taste and edge cases.

If you’re earlier in your career

I get the anxiety. The feed makes it sound like typing speed was the whole game.

It wasn’t. The people who were already winning were clear thinkers, careful readers, good at naming the system. That matters more now, not less — because the machine will happily give you nine plausible versions and zero accountability.

The skill that compounds is: know what good looks like, fast. That’s not anti-AI. That’s the only way to use it without losing the plot.

So what?

Models will keep improving. Your org chart won’t magically become more patient.

The teams that feel healthy in 2026 aren’t the ones with the flashiest demos. They’re the ones who stopped pretending delegation is binary — who built a little structure around trust so humans aren’t the bottleneck or the napkin someone signs when they’re tired.

Everything else is just vibes. And vibes don’t keep the database up at 3 a.m.

Related Articles