The Multiplier Problem
The core problem with AI coding agents is that they are multipliers, and amplify what they find, good and bad.
Give an agent a codebase with consistent conventions, clear module boundaries, and comprehensive tests, and it will produce code that integrates cleanly and follows your patterns. This code will be instantly ready for review. It can do this faster than any new hire you have ever onboarded.
Give it a codebase with duplicated business logic, scattered directory structure, and three different approaches to error handling, and it will do something worse than fail. It will succeed confidently and at volume, in ways that look correct until they reach production or an auditor's desk.
I work with data engineering teams in aviation, defence, and commercial operations. Over the past twelve months, across four engagements, I have seen both outcomes. The teams that moved fastest with agents were not the ones with the best tools or the most aggressive adoption timelines. They were the ones who spent days - not weeks, days - strengthening their engineering foundations before agents touched production code.
This article covers what that preparation looks like and why skipping it is more expensive than doing it.
What Goes Wrong Without The Foundation
Most teams introduce agents into codebases that were built for humans, not machines. Human engineers carry institutional memory across sessions; they know which module is authoritative, which error-handling pattern the team agreed to use last quarter, and why that utility function in src/ exists and should not be touched. Agents start fresh every session. They reconstruct their understanding of your project from whatever state it’s in.
If that context is contaminated with scattered logic, inconsistent naming, and unclear separation of concerns, the agent will produce problematic code consistently. It's following the patterns in your codebase.
I worked with a commercial aviation team, building flight-ops status and maintenance-event classification who had adopted Claude Code early and aggressively. The codebase had grown organically over several years with many contributors. Business logic was distributed across src/ with no clear ownership, safety-status checks were duplicated in three modules with slightly different edge-case handling, and there was no consistent pattern for logging. When agents entered the workflow, velocity did not increase. Within two sprints, a single maintenance-state rule had three conflicting implementations because each agent session had used a different module as its reference. The review burden tripled. Two of those implementations would have produced incorrect outputs for a specific class of malformed telemetry records - the kind of edge case that surfaces during an audit, not a demo.
The team that moved fastest, an aviation data engineering group building transformation pipelines under strict requirements took a different approach. Before scaling agent use, they spent the best part of a day restructuring the repository, documenting what belonged in each directory, and consolidating duplicated transformation logic into canonical locations. Within a week, their merged-PR rate had gone from roughly four per sprint to eight and this was sustained over the following three sprints. This was measured through the same review process they were running before agents entered the workflow. The baseline was a team of five engineers working two-week sprints on a codebase of approximately 60k lines.
The difference was not the agent. It was the codebase the agent was reading.
Five Foundations That Make Agents Productive
These are the specific practices I implement with teams before scaling agent-assisted delivery. They are ordered by the damage they prevent.
1. Consolidate Duplicated Business Logic
This is where I have seen the most damage in regulated codebases, and it's the single highest-leverage fix.
When business logic exists in multiple places, agents fix the version they find first, or they create a new one and leave the originals. Now you have three implementations, and the next agent session has no way to determine which is correct.
One team I worked with had four different implementations of a safety-event field-redaction step across their data pipeline. An agent introduced a fifth. The inconsistency was caught in review, but only because the reviewer happened to know about the other four. In a codebase with thirty or forty pipeline stages, that kind of recall is not a reliable control.
In regulated environments, the highest-risk duplicates are not always where you expect. In multiple aviation engagements, the drift was not in transformation code, it was in exception-handling paths where teams had copy-pasted control logging with slightly different field mappings. That is the kind of inconsistency that auditors find during airworthiness reviews and engineers miss.
Fix first: Identify every location where control-critical logic; data classification, access control, fee calculation, PII handling, validation is implemented. Consolidate to one authoritative version. This single step eliminates the most common source of agent-produced conflicts.
2. Make Repository Structure Explicit
Your directory layout is the first signal an agent uses to decide where new code belongs. If that signal is ambiguous, agents will place code inconsistently across sessions.
More importantly, where code lives determines what scrutiny it receives. If control-relevant logic ends up in a utility directory that your review process treats as low-risk, you have a gap between your actual risk and your review coverage.
At minimum:
- Each top-level directory has an explicit contract: what belongs, what does not.
- Business-rule code is separated from adapters; I/O, orchestration, APIs.
- Control-critical logic has a single, known location.
The test is simple: if an engineer or agent cannot answer "where does this change belong?" in under a minute, the structure is not clear enough.
3. Write Tests That Serve As Evidence
Agents are very good at writing code that satisfies a specification. They are less reliable at determining what the correct specification should be.
When you write tests first, you give the agent an unambiguous contract. The code either passes or it does not. This reduces review loops as test-driven development defines done, and enables faster iteration, because agents can refine against functional targets.
In regulated environments, tests serve a second function: they are audit evidence. When someone asks how you verified that a data transformation behaves correctly, "an AI agent wrote it and a human reviewed it" is a weak answer. "The behaviour is specified in tests that run on every commit, and the agent's output was validated against those specifications" is defensible. Teams operating under strict regulations like EASA/CAA/HIPAA are constantly being asked these questions.
The tests that matter most are not happy-path tests. They are tests for control-relevant edge cases: nulls, malformed records, boundary values, out-of-policy inputs, unauthorised access attempts. This is where agent-assisted productivity gains get lost, agents handle the common path well, but unspecified edge cases in control logic create both bugs and compliance risk.
4. Enforce One Pattern Per Operation
The most effective way to get consistent output from agents is not elaborate prompting or system instructions. It is a codebase with one obvious way to do things.
Agents read surrounding code and match it. I have tested this across several projects: when a codebase has a single, consistent pattern for error handling, logging, or retry logic, agents reproduce that pattern reliably without explicit instruction. When the codebase has multiple patterns, agents produce yet another variant. The codebase acts as the prompt.
Define these explicitly and consolidate to one approach:
- Retry and resilience patterns
- Error taxonomy and logging shape
- Naming conventions for common operations
- Stick to a single coding standard, e.g. PEP8
- Required metadata in docstrings for control-critical functions
If your codebase currently supports three different approaches to error handling, an agent will produce a fourth. In regulated pipelines, where error-handling paths often carry control logic, that inconsistency is not cosmetic, it's a potential finding.
5. Tighten Version-Control Boundaries
Agent workflows generate local state, scratch files, intermediate data, and debug artifacts. In most codebases, this is housekeeping. In regulated environments, it is a compliance exposure.
I have seen agent sessions produce intermediate outputs containing production data samples and write temporary files with schema fragments that included column names from protected datasets, Maybe most worrying of all, I have seen them generate debug logs with values that should never have left the processing environment. None of these were intentional. All of them would have entered version control without a tightened ignore policy.
# Agent / IDE state
.claude/
.cursor/
*.ai-context
# Data outputs — never track raw or intermediate data
data/raw/
data/processed/
data/intermediate/
*.tmp
*.scratch
Then test it: run an agent session, check git status, and verify nothing unexpected appears. In an environment subject to audit, "we caught it in review" is not an acceptable control. The boundary must prevent the leak, not detect it.
The Speed Objection
The pushback I hear most often is: "This sounds like a week of cleanup before we get any benefit."
I understand the resistance. Every team I work with is under immense pressure to deliver now.
Here is what I have observed across four engagements: teams that skip the foundation work see an initial velocity increase that lasts one to two sprints, but then it degrades. In one aviation data team, review cycle time doubled between sprint one and sprint three, not because the agents got worse, but because each session introduced slight variations on existing logic that reviewers had to trace back to source. Pull requests that had been merging in one review round started requiring two or three. The team eventually paused agent use entirely to clean up the drift, losing more time than the foundation work would have cost.
Teams that invest early see slower output initially but then sustained improvement moving forward. Here, "early" means two to three days for a team of four to six engineers working on a codebase under 100k lines. The aviation pipeline team I mentioned earlier maintained their throughput gain over the following eight weeks. The specific change: pull requests went from averaging 2.4 review comments to 0.8, and first-pass approval rate went from roughly 45% to over 75%. Agents were producing code that matched existing patterns closely enough that reviewers were checking logic, not style or placement.
The tradeoff is not speed versus governance. It is a short burst of headline velocity versus sustained, auditable throughput.
Where To Start
You do not need to fix everything before agents become productive. These are ordered by leverage:
-
Audit for duplication. Find business logic that exists in more than one place. Start with control-critical paths: data classification, access control, validation, financial calculations. Consolidate the highest-risk duplicates first.
-
Clarify directory contracts. Write a one-line description of what belongs in each top-level directory. If you cannot write it, the boundary is not clear enough.
-
Add tests to critical paths. You do not need full coverage. You need tests on the code paths where a wrong answer has consequences. In regulated environments, those tests become your evidence that agent-generated code was validated against defined specifications.
-
Pick one pattern per common operation. Retries, error handling, logging, control-relevant exception paths. If your codebase has multiple approaches, choose one and consolidate.
-
Tighten your ignore policy. Run an agent session, check
git status, fix anything unexpected before the next session.
Most teams are able to complete these steps in two to three days. The improvement compounds from there. Each cleanup makes the next agent session more predictable, and each predictable session builds confidence to delegate more.
What The Checklist Doesn't Cover
The five steps above will handle the structural work, the kind of cleanup any senior engineer can lead. What they will not tell you is which of your duplications carry actual compliance risk versus cosmetic inconsistency, or how your agent-generated code patterns map to the specific regulatory expectations your auditors will apply.
I've seen this directly: one team consolidated their exception-handling paths correctly — single pattern, single location — and still received a finding because the consolidated logic didn't map to a specific field-mapping requirement in their data processing agreement. The structure was right. The regulatory mapping wasn't. That gap is where teams get stuck. The engineering cleanup is straightforward. Knowing which cleanup matters for your audit posture, and which you can defer, requires someone who has read both the codebase and the compliance framework. If that sounds familiar, I'm happy to talk it through. I do a short scoping call (about 30 minutes) where I'll tell you honestly whether your team needs outside help or whether you can handle it with what's in this article.