Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessSam Altman adds ‘TBPN’ to OpenAI’s growing influence machine - The San Francisco StandardGoogle News: OpenAISeeing can Chat Qwen Ai beat shrink JSON and TOON based on TOON?discuss.huggingface.coMicrosoft executive touts Copilot sales traction as AI anxiety weighs on stock - CNBCGoogle News: AIAnthropic Brings Claude Computer Use to Windows - Thurrott.comGoogle News: Claude[D] Physicist-turned-ML-engineer looking to get into ML research. What's worth working on and where can I contribute most?Reddit r/MachineLearning🔥 Alishahryar1/free-claude-codeGitHub Trending🔥 roboflow/supervisionGitHub Trending🔥 zai-org/GLM-OCRGitHub Trending🔥 MervinPraison/PraisonAIGitHub Trending🔥 sponsors/asgeirtjGitHub TrendingChatGPT users beware: bot has been trained for flattery, not real decisions - AudacyGoogle News: ChatGPTHow Does AI-Powered Data Analysis Supercharge Investment Decisions in Today's Inflationary World?Dev.to AIBlack Hat USADark ReadingBlack Hat AsiaAI BusinessSam Altman adds ‘TBPN’ to OpenAI’s growing influence machine - The San Francisco StandardGoogle News: OpenAISeeing can Chat Qwen Ai beat shrink JSON and TOON based on TOON?discuss.huggingface.coMicrosoft executive touts Copilot sales traction as AI anxiety weighs on stock - CNBCGoogle News: AIAnthropic Brings Claude Computer Use to Windows - Thurrott.comGoogle News: Claude[D] Physicist-turned-ML-engineer looking to get into ML research. What's worth working on and where can I contribute most?Reddit r/MachineLearning🔥 Alishahryar1/free-claude-codeGitHub Trending🔥 roboflow/supervisionGitHub Trending🔥 zai-org/GLM-OCRGitHub Trending🔥 MervinPraison/PraisonAIGitHub Trending🔥 sponsors/asgeirtjGitHub TrendingChatGPT users beware: bot has been trained for flattery, not real decisions - AudacyGoogle News: ChatGPTHow Does AI-Powered Data Analysis Supercharge Investment Decisions in Today's Inflationary World?Dev.to AI
AI NEWS HUBbyEIGENVECTOREigenvector

There’s a hidden tax on every AI-generated merge request

The New Stackby Brian WaldApril 2, 20261 min read0 views
Source Quiz

AI coding tools haven t removed bottlenecks. They ve moved them to the review queue, putting more pressure on senior engineers. This The post There’s a hidden tax on every AI-generated merge request appeared first on The New Stack .

AI coding tools haven’t removed bottlenecks. They’ve moved them to the review queue, putting more pressure on senior engineers.

This result isn’t surprising: As teams grow, code generation increases, but review expertise doesn’t scale the same way. If you measure AI adoption by MR (merge request) volume, lines of code, or seat usage, you’re only tracking inputs, not the real bottleneck.

The 2025 DORA (DevOps Research and Assessment) data show that key delivery metrics such as lead time, deployment frequency, change failure rate, and MTTR haven’t improved with increased use of AI tools. Teams with the fewest change failures are also the least likely to use AI-assisted development tools. This doesn’t mean AI tools are harmful, but it’s a reminder not to assume that more MRs mean higher productivity.

The review queue became the sprint plan

Recently, I worked with a customer’s AI enablement engineering team to review their delivery metrics. Their AI coding tool rollout showed strong adoption, but segmenting cycle time by reviewer revealed a different picture. It turned out that engineers with the most system knowledge had review queues so large that reviewing became their primary responsibility, limiting their capacity for design and architecture work.

This pattern is consistent across teams: MR volume increases, review times lengthen, and senior engineers have less time for design. The workload concentrates because a small group handles deep system context, security-sensitive areas, and ownership boundaries. Mid-level engineers cannot step in to review critical changes for senior reviewers.

The primary cost is attention fragmentation. Senior engineers face frequent interruptions, leading to predictable declines in quality: superficial approvals due to long queues, or delays when complex MRs wait for available time.

Passing CI does not mean it is cheap to review

Automated checks can handle more work, but human judgment doesn’t scale the same way. Even if a pipeline passes, reviewers in regulated environments still need to understand the intent, assess the impact, verify authorization boundaries, examine failure behavior, and confirm audit readiness.

“Automated checks can handle more work, but human judgment doesn’t scale the same way.”

AI-generated code increases the burden of verifying plausible correctness. The code may compile, pass tests, and meet linting standards, but reviewers must still confirm it fulfills the intended purpose, handles data classification properly, and avoids policy violations. This verification often takes longer when code is generated for syntactic correctness rather than system-level intent. As queues grow, reviewers spend less time per MR, creating pressure on both speed and quality.

Generation scales, judgment doesn’t

It’s worth acknowledging that AI coding tools do contribute to productivity. They can produce code quickly, accelerate refactoring, and enable teams to address more features per sprint. That is real value for development teams.

However, review capacity is constrained by limited context and personal accountability. When a senior engineer approves a change to an identity service in a regulated environment, they assume organizational risk. That responsibility doesn’t change regardless of how quickly the code was generated.

“It’s not yet true that AI-assisted review can reliably substitute for the contextual judgment that makes senior review valuable.”

“Just add AI code review” is the obvious next move. But it’s not yet true that AI-assisted review can reliably substitute for the contextual judgment that makes senior review valuable in high-risk codepaths. In practice, accelerating generation without redesigning the review will not solve the problem.

When AI review actually closed the gap

In one case, AI-assisted review did reduce the workload for senior engineers. The team already had solid CI, clear code ownership, consistent service templates, and review standards written out as checklists. The improvement came from using AI for pre-triage: summarizing intent, flagging policy-related file changes, and matching diffs to known patterns. Senior reviewers focused on high-risk changes, speeding up cycle times without increasing defects.

Conversely, I’ve seen failure when AI coding was broadly implemented without workflow adjustments. In one regulated company, the senior review load increased sharply. Throttling code generation and requiring fewer, larger MRs only created larger batch sizes and harder reviews. The effective solution was workflow redesign: enforcing strict scope rules for small diffs, requiring author summaries and risk declarations, automating policy checks, and rotating trained “risk captains” to handle high-risk triage. Most changes became low-risk by design, with experts focusing on exceptions.

The most important factor wasn’t the tool. It had clear review standards and workflows, managed with ownership, iteration, and feedback. AI strengthened existing workflows. Where standards are loose, AI triage adds noise that reviewers ignore.

Where the tax compounds: seams, exceptions, and risk ownership

The hardest reviews are about policy and system boundaries: data classification, logging, authorization, and failure handling. At a global bank with more than 4,000 engineers across security, platform, and delivery teams, each group tracked its own metrics. Security cut vulnerabilities, platform improved uptime, engineering sped up deployments, but handoffs between teams caused significant delays. No one owned the full cycle time.

AI-generated volume can worsen this dynamic by increasing boundary-crossing changes unless workflows are designed to contain them. Even adopting smaller MRs as a best practice can hinder flow if each still requires cross-team review, security approval, or compliance documentation.

Measuring the constraint instead of the output

If you don’t measure reviewer capacity, you may misinterpret how AI is affecting your team. These metrics help distinguish real throughput from congestion:

  • MR cycle time segmented by reviewer, not averaged across the org

  • Reviewer load per senior engineer: reviews per day, active queue depth, hours spent in review

  • Defect escape rate, severity-weighted, and split by AI-assisted versus non-AI-assisted MRs

Distribution matters more than averages. A small number of overloaded experts can delay critical systems even when overall metrics look healthy. The key indicator is senior engineers regaining design time while maintaining quality, not the number of MRs opened.

Adding structure through workflow design

The solution isn’t to ban AI or rubber-stamp approvals. It’s to introduce structure so that increased AI-generated volume doesn’t automatically increase senior engineers’ cognitive load. An example framework could be:

  • Implement pre-review triage as a core step in the workflow. AI can summarize intent, map affected files to risk areas, and flag missing tests, enabling reviewers to start with a clear risk assessment.

  • Establish risk-tiered review paths so that routine changes go to peer review, while changes involving authorization, data handling, or cross-service boundaries are routed to senior reviewers.

  • Attach evidence directly to the MR: threat model notes, data-handling annotations, test results, and policy-check outcomes.

  • Enforce work-in-progress limits for designated reviewers through CODEOWNERS rules and workflow automation.

The decision rule for expanding AI usage

A useful leadership test: after AI adoption, did time-in-review for high-risk changes decrease, or did the workload concentrate among fewer senior reviewers? If the latter, code generation is increasing against fixed review capacity.

Treat senior review attention as a governed resource, with explicit limits, routing rules, and escalation procedures. Only expand AI-assisted code generation to new repositories or teams when reviewer workload and defect escape rates remain within acceptable limits.

“At the start of each week, ask one question: Who are the two people currently limiting the merging of high-risk changes, and what is their current review queue depth?”

At the start of each week, ask one question: Who are the two people currently limiting the merging of high-risk changes, and what is their current review queue depth? If you can’t answer it, the system lacks visibility. If the number grows each week, you’ve found your hidden tax.

TRENDING STORIES

Group Created with Sketch.

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map
TopicsEntitiesSource
There’s a h…reviewThe New Sta…

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 182 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Products