DEV Community: Clear Code Intelligence

Technical Debt Has a New Cost Center

Clear Code Intelligence — Sat, 13 Jun 2026 01:38:08 +0000

Technical debt used to be priced mostly in human engineering time.

Now it also shows up as AI-agent operating cost.

When a repository has unclear ownership, weak failure tests, tangled boundaries, generated code without explanation, dependency drift, or large context-heavy modules, AI coding agents do not magically move faster.

They search more files.

They burn more context.

They retry more patches.

They need more human review.

They cost more to operate.

The Token Tax

AI coding agents do not only charge for generation.

They also charge for inference.

If the repo does not clearly answer basic questions, the agent has to reconstruct the answers:

where is the source of truth?
which module owns this behavior?
what tests prove failure behavior?
is this duplicate logic intentional?
is this generated code safe to modify?
which dependency boundary is allowed?
what can be deleted without breaking production?

Every unclear answer becomes extra context, search, retries, and review work.

That is AI token debt.

What An Audit Should Show

A useful technical debt audit should not be a scanner dump.

It should show:

exact source evidence
active debt vs accepted risk
false positives and scope classification
AI-token-debt drivers
smallest safe remediation path
owner and priority
proof required after cleanup

The goal is not to shame a codebase.

The goal is to make the next change cheaper.

That next change might be made by a human engineer.

It might be made by an AI coding agent.

Either way, the repository has to be easier to reason about.

The Practical Ask

Send one repo.

Identify the files, boundaries, tests, generated-code areas, dependency risks, and ownership gaps that make humans and AI agents burn unnecessary time, tokens, and review cycles.

Then reduce the debt and produce a before/after proof report.

That is where technical debt becomes an operating cost discussion.

What We Learned Scanning React

Clear Code Intelligence — Sat, 13 Jun 2026 01:01:08 +0000

Clear Code Intelligence scanned the public React repository: react/react.

This is not a dunk on React.

React is one of the most important open-source frontend projects in the world. It is also exactly the kind of repository that shows why technical debt reporting has to be more than pattern matching.

Mature framework repositories contain runtime internals, compiler code, server rendering code, DevTools implementation, fixtures, generated expectations, changelog history, compatibility logic, build tooling, and long-lived public API decisions.

If a report treats all of that as the same kind of debt, the report is not useful.

What We Scanned

The Clear Code scan reviewed the public react/react repository and produced a technical diligence PDF report.

The scan measured:

7,228 repository files
7,070 analyzed files
1,033,022 lines of code
250 findings surfaced in the PDF
4,742 raw findings generated before report curation
high AI token debt risk

The raw scorecard was intentionally severe:

Area	Score
Overall raw diligence	1/100
Projected after remediation	19/100
Architecture	6/100
Maintainability	0/100
Security	0/100
Delivery	0/100
AI governance	0/100

That raw score should not be read as "React is bad."

It should be read as "large framework repositories need scope-aware analysis before findings become decisions."

The Real Lesson: Scope Classification

React is a framework/runtime/compiler repository.

That matters.

A generic scanner can flag large files, dense lines, deferred-work markers, duplicated-looking logic, or complex control flow. Those signals are often useful, but they are not automatically equal.

A useful report has to classify evidence into buckets such as:

production runtime implementation
compiler implementation
server rendering implementation
DevTools implementation
generated expectation fixture
changelog or release history
compatibility debt
accepted framework complexity
active remediation candidate
false positive

Without that classification, a report becomes noisy.

With that classification, it becomes decision support.

Where AI Token Debt Shows Up

AI token debt is the extra context, search, inference, retry, and review work created when a codebase is hard to reason about.

React is a strong example because an AI agent working in this repository has to understand multiple layers of context:

public API compatibility
reconciler behavior
server rendering and streaming behavior
React Server Components and Flight surfaces
compiler lowering and generated expectations
DevTools behavior
build/release modes
test fixture intent

That is not a criticism. It is the nature of a mature public framework.

But it does mean an AI agent cannot safely modify many areas by reading one file. It must gather context, inspect related packages, understand fixture semantics, and avoid breaking compatibility assumptions.

That is the token tax.

The more the agent has to infer, the more it burns.

Interesting Hotspots

The scan surfaced deferred-work clusters in areas that are naturally expensive for AI agents to modify:

packages/react-server/src/ReactFizzServer.js
packages/react-server/src/ReactFlightServer.js
packages/react-reconciler/src/ReactFiberCommitWork.js
packages/react-reconciler/src/ReactFiberWorkLoop.js
packages/react-client/src/ReactFlightClient.js
compiler/packages/babel-plugin-react-compiler/src/HIR/BuildHIR.ts
compiler/crates/react_compiler_lowering/src/build_hir.rs

Those are not automatically defects.

They are areas where context matters. A future human or AI agent touching these files needs to understand the surrounding protocol, runtime, compiler, compatibility, and test expectations before changing behavior.

That is exactly what a modern technical debt report should show.

Findings That Need Downgrades

The scan also exposed places where tooling needs to be smarter.

For example:

CHANGELOG.md should not be scored like runtime code.
generated compiler expectation files should not be scored like production implementation files.
fixture names containing todo are often test taxonomy, not unmanaged delivery debt.
long lines in release notes are not the same as long lines in business logic.
framework compatibility comments can represent deliberate tradeoffs, not careless debt.

This is useful product feedback.

Clear Code needs to keep improving file-scope classification for framework repositories:

changelog
docs
test fixture
generated output
snapshot
benchmark
compiler expectation
runtime implementation
public API surface
accepted compatibility cost

That classification layer is what turns raw findings into executive-grade analysis.

Why This Matters for AI-Assisted Engineering

The next generation of technical debt is not only about human readability.

It is also about AI cost.

When code ownership is unclear, tests do not explain failure behavior, fixtures are indistinguishable from production code, and generated artifacts are mixed with implementation files, AI agents spend more tokens reconstructing context.

That cost appears as:

larger prompts
more file searches
more tool calls
more retries
longer review cycles
higher risk of hallucinated changes
more human supervision

In other words, technical debt now charges interest in both engineering time and AI-agent operating cost.

What We Would Improve Next

The React scan was useful because it showed both the power and the limits of automated reporting.

The next version of the report should:

classify framework repo paths before scoring severity
separate raw score from interpreted score
identify generated and fixture-heavy paths automatically
distinguish accepted compatibility complexity from cleanup candidates
show AI-token-debt drivers by domain area
explain which findings deserve action and which deserve acknowledgement

That is the standard technical leaders need.

Not scanner dumps.

Decision support.

Invitation

Public repositories are useful because the evidence can be inspected and the methodology can be challenged.

If anyone from the React maintainer community wants the full PDF report, we would be glad to share it and hear where the scan should be corrected, tuned, or scoped differently.

Public code deserves public, fair, evidence-backed analysis.

What We Learned Scanning Netflix Atlas

Clear Code Intelligence — Fri, 12 Jun 2026 21:10:37 +0000

Clear Code Intelligence scanned a public Netflix repository: Netflix/atlas.

This is not a dunk on Netflix.

It is a public-code methodology test.

After scanning Google zx and Microsoft agent-framework, we wanted a different kind of repository. Netflix Atlas is an observability and telemetry project with a mature platform-engineering shape. It is mostly Scala, and it includes query/evaluator logic, API modules, language-server tooling, resource files, tests, and platform integration code.

That makes it a useful scan target because it tests whether a technical debt report can understand domain context.

What We Scanned

The Clear Code scan reviewed the public Netflix/atlas repository and produced a technical diligence PDF report.

The scan measured:

1,247 repository files
706 analyzed files
89,113 lines of code
186 report findings
high AI token debt risk

The scorecard was mixed:

Area	Score
Overall diligence	35/100
Projected after remediation	53/100
Delivery	96/100
Open source readiness	83/100
Architecture	45/100
Maintainability	0/100
AI governance	0/100

The delivery and open-source signals were strong. That matters because a serious report should not only criticize. It should show where the repository is already strong.

The Important Lesson Is Classification

Atlas is an observability/query system.

That means some findings require domain-aware interpretation.

For example, a generic scanner can flag evaluator-style code as dynamic execution. But in a query language, expression evaluation may be expected product behavior. The real report question is not simply "is there eval-like behavior?"

The better questions are:

Is this expected DSL/query behavior?
Is user input constrained?
Is execution sandboxed or bounded?
Are failure modes tested?
Are ownership boundaries clear?
Is this active debt or accepted design?

That distinction matters.

A scanner dump can find a pattern.

A useful technical debt report has to explain what the pattern means.

Where AI Token Debt Appears

AI token debt is the extra AI-agent context, search, inference, retry, and review work created when a codebase is hard to reason about.

The Atlas scan modeled high AI token debt because of:

complexity drag
context sprawl
large-context files
deferred decisions
dependency uncertainty

Some context hotspots included:

atlas-lsp/src/main/scala/com/netflix/atlas/lsp/AslDocumentAnalyzer.scala
atlas-core/src/main/scala/com/netflix/atlas/core/stacklang/Interpreter.scala
atlas-webapi/src/main/scala/com/netflix/atlas/webapi/ExprApi.scala
atlas-postgres/src/main/scala/com/netflix/atlas/postgres/SqlUtils.scala
atlas-pekko/src/main/scala/com/netflix/atlas/pekko/StreamOps.scala

The key point is not that large files are automatically bad.

The key point is that AI agents pay for ambiguity.

When a future agent needs to modify query behavior, language-server behavior, expression parsing, or web API behavior, it has to reconstruct domain context before it can safely change the code. The more concentrated that context is, the more the agent spends on search, inference, retries, and human review.

False Positives Are Product Feedback

The scan also exposed places where tooling should improve.

For example:

palette resource files are not the same as large runtime modules
postgres/postgres in a local test suite is not the same as a leaked production credential
syntax-highlighting token names are not credentials
query/evaluator logic needs domain context
benchmark modules should not be scored the same way as production paths

That does not make the scan useless.

It makes the scan useful product feedback.

Technical debt tooling needs scope classification:

production runtime code
test fixture
local-only config
static resource
generated asset
benchmark code
expected domain behavior
active debt
accepted risk
false positive

Without that layer, reports become noisy.

With that layer, reports become decision support.

Why Public Scans Matter

Public repositories are useful because the evidence can be inspected and the methodology can be challenged.

The goal is not to shame maintainers.

The goal is to make technical debt analysis concrete:

exact source evidence
confidence level
scope classification
domain interpretation
remediation path
verification expectation
AI-agent cost driver

If anyone from Netflix Open Source or the Atlas maintainer community wants the full PDF report, we would be glad to share it and hear where the scan should be corrected, tuned, or scoped differently.

Public code deserves public, fair, evidence-backed analysis.

What We Learned Scanning Microsoft's Public Agent Framework Repository

Clear Code Intelligence — Fri, 12 Jun 2026 21:03:24 +0000

Clear Code Intelligence scanned a public Microsoft repository: microsoft/agent-framework.

This is not a dunk on Microsoft.

It is a public-code methodology test.

Microsoft's public GitHub organization is verified and publishes thousands of open-source repositories. microsoft/agent-framework is especially relevant because it is a framework for production-grade AI agents and multi-agent workflows.

That makes it a strong example of a new technical debt problem:

Large AI-agent frameworks need scope-aware technical debt reporting.

What We Scanned

The Clear Code scan reviewed the public microsoft/agent-framework repository and produced a 31-page technical diligence report.

The scan measured:

4,620 analyzed files
703,170 lines of code
250 report findings
1,156 raw managed findings
high AI token debt risk

The scorecard was severe:

Area	Score
Overall diligence	29/100
Projected after remediation	47/100
Architecture	100/100
Delivery	70/100
Maintainability	0/100
AI governance	0/100

That raw result needs careful interpretation.

This is a large repository with Python packages, .NET packages, frontend tooling, samples, documentation, test fixtures, generated-looking assets, and integration examples. A useful technical debt report cannot treat all of those scopes the same way.

The Important Lesson Is Scope

One example from the scan illustrates the point.

The scanner flagged an AWS access-key-shaped value in sample documentation:

AWS_ACCESS_KEY_ID | AKIAIOSFODNN7EXAMPLE

That value looks like an AWS access key pattern.

But it also appears to be an example key shape commonly used in documentation.

A noisy scanner would call this a breach.

A serious technical debt report should classify it:

documentation example
active secret
test fixture
false positive
accepted risk
missing safe-example annotation

That classification step is critical.

The value still deserves evidence and review. But the remediation is probably not "rotate production credentials." The remediation is more likely to make the example classification explicit so humans and AI agents do not keep rediscovering the same context.

Scanner Dumps Are Not Enough

In a large AI-agent repository, raw findings mix very different things:

core runtime code
generated frontend assets
sample applications
docs and READMEs
test fixtures
dependency metadata
security-sensitive examples
multi-language package boundaries

If those are all scored as one undifferentiated bucket, the report can become technically correct but operationally weak.

The better report should answer:

Is this production runtime code?
Is this sample code?
Is this generated code?
Is this documentation?
Is this an accepted risk?
Is this a real secret or example credential?
Is this an AI-agent reasoning hotspot?

That is the difference between "we found 1,156 things" and "here is the remediation plan."

The AI Token Debt Signal

The strongest signal from the scan was AI token debt.

AI token debt is the extra AI-agent context, search, inference, retry, and review work created when a codebase is hard to reason about.

For microsoft/agent-framework, the scan modeled high AI token debt because the repository contains:

703,170 LOC
4,620 analyzed files
89 large files
62 complex files
386 dependency signals
72 files with deferred-work markers

A few context hotspots stood out:

python/packages/openai/agent_framework_openai/_chat_client.py
python/packages/core/agent_framework/observability.py
python/packages/core/agent_framework/security.py
python/packages/core/agent_framework/_agents.py
multiple DevUI frontend files above 1,500 LOC

The issue is not that large files are automatically bad.

The issue is that AI agents pay for ambiguity in tokens.

When context is spread across Python, .NET, frontend tools, samples, package boundaries, docs, and dependency policy, an agent needs more context to make safe changes. It searches more. It retries more. It asks for more review. It has to infer which code is production-critical and which code is illustrative.

That is technical debt in an AI-assisted engineering environment.

Strong Architecture Can Still Have AI-Agent Friction

One of the most useful parts of the scan was that architecture scored 100/100.

That prevents an overly simplistic conclusion.

The repo did not look structurally chaotic in the scanner's architecture model. The friction came from a different layer:

context size
classification gaps
long files
dependency uncertainty
deferred-work markers
mixed production/sample/documentation scopes

That is exactly why technical debt needs a richer model in the AI era.

The question is not only "is the architecture clean?"

The question is also:

How much work does the codebase force every future engineer and AI agent to do before they can safely change it?

What Clear Code Needs To Improve

This scan also teaches us something about our own product.

Clear Code needs stronger scope classification for large public repositories:

production package
sample package
docs
generated assets
test fixtures
demo credentials
accepted risk
false positive

That classification would make the score more useful and the remediation plan more credible.

The best technical debt report is not the harshest report.

It is the report that helps maintainers decide what to do next.

Why Public Scans Matter

Public repositories let technical debt discussions become concrete.

The evidence is inspectable. The methodology can be challenged. Maintainers can correct the interpretation.

That is the right standard.

If anyone from Microsoft Open Source or the Agent Framework maintainer community wants the full PDF report, we would be glad to share it and hear where the scan should be corrected, tuned, or scoped differently.

Public code deserves public, fair, evidence-backed analysis.

What We Learned Scanning Google's Public zx Repository

Clear Code Intelligence — Fri, 12 Jun 2026 20:46:21 +0000

Clear Code Intelligence scanned a public Google repository: google/zx.

This is not a dunk on Google.

It is a public-code methodology test.

Google's public GitHub organization is verified and publishes thousands of open-source repositories. zx is a useful scan target because it is popular, developer-facing, and intentionally close to shell execution workflows.

That makes it a good example of a hard problem in technical debt reporting:

What should a report do when a pattern looks risky, but that pattern may also be part of the product's intended surface area?

What We Scanned

The Clear Code scan reviewed the public google/zx repository and produced a 29-page technical diligence report.

The scan measured:

129 analyzed files
20,216 lines of code
37 findings
6 high severity findings
12 medium severity findings
19 low severity findings

The scorecard was mixed, which is exactly what makes the repository interesting:

Area	Score
Overall diligence	54/100
Architecture	100/100
Delivery	81/100
Open source readiness	68/100
Maintainability	45/100
AI governance	32/100

The architecture signal was strong. The scan found no dependency cycles and clear structural signals.

The debt was concentrated elsewhere: governance, context hotspots, execution-surface classification, and AI-agent reasoning cost.

The Most Important Finding Was Context

A generic scanner can flag dynamic execution or shell execution patterns.

But zx is a shell scripting tool. That means execution-related findings cannot be interpreted the same way they would be interpreted in a normal web application.

For example, the scan found execution-surface evidence in files such as:

// src/core.ts
this._zurk = exec({
  cmd: self.fullCmd,
  cwd,
});

That evidence matters.

But it does not automatically mean "remove this."

The better report question is:

Is this intended product surface?
Is this accepted risk?
Is this missing hardening?
Is this missing documentation?
Is this missing test coverage?
Is this a false positive?

That distinction is the difference between a scanner dump and a useful technical debt report.

Strong Repositories Still Have Diligence Gaps

One of the useful lessons from scanning a high-profile public repository is that technical debt is not a binary label.

The report found several positive signals:

strong architecture score
test presence
CI presence
no detected dependency cycles
clear public repository identity

It also found governance gaps that are common in open-source diligence:

missing SECURITY.md
missing CODEOWNERS
missing dependency automation
fixture package manifests without lockfile or license metadata

Those are not dramatic findings.

But they matter because enterprise users and AI-assisted maintainers need more than working code. They need routing, ownership, disclosure process, dependency controls, and explicit evidence.

The AI Token Debt Angle

The most interesting signal was AI token debt.

AI token debt is the extra AI-agent context, search, inference, retry, and review work created when a codebase is hard to reason about.

The scan modeled google/zx as high AI token debt risk:

3.2x modeled input context versus a clean, well-evidenced repository
2.1x modeled rewrite output
2.4x modeled review load
primary hotspot: src/core.ts

The point is not that zx is unusually large. It is not.

The point is that AI-agent cost is not determined only by repository size.

It is determined by how much the agent has to infer.

In the scan, src/core.ts stood out as the dominant context hotspot:

976 LOC
174 branch tokens
high recent churn signal
multiple execution-related evidence points

For human maintainers, this means review and ownership concentration.

For AI agents, it means more context loading, more search, more patch retries, and more human validation.

What A Better Report Should Do

This scan reinforced a core Clear Code belief:

Technical debt reports should not only list findings.

They should classify findings.

A useful report should separate:

active debt
accepted risk
expected product behavior
generated/vendor code
governance gaps
false positives
remediated findings
findings that need verification

That matters even more for AI-assisted development.

If the report does not preserve context, every future engineer and every future AI agent has to rediscover the same reasoning.

Why Public Scans Matter

Public repositories are useful teaching material because the evidence is inspectable.

The point is not to shame maintainers.

The point is to make technical debt analysis more concrete:

exact source evidence
clear confidence level
fair interpretation
remediation options
governance implications
AI-agent cost drivers

That is the standard technical debt tooling needs to move toward.

If anyone from Google Open Source or the zx maintainer community wants the full PDF report, we would be glad to share it and hear where the scan should be corrected, tuned, or interpreted differently.

Public code deserves public, fair, evidence-backed analysis.

How To Measure AI Token Debt In A Real Codebase

Clear Code Intelligence — Fri, 12 Jun 2026 20:30:08 +0000

AI token debt is the extra AI-agent context, repository search, inference, retry, and validation work created when a codebase is hard to reason about.

It is not a special fee from a model provider.

It is an operating-cost pattern.

When a repository is clear, an AI coding agent can usually answer the important questions cheaply:

where the behavior lives
which module owns it
what tests prove it
what can be safely changed
what failure modes matter
what code should not be touched

When a repository is unclear, the same task becomes more expensive. The agent reads more files, performs more searches, retries more patches, and asks the human reviewer to validate more assumptions.

That is the practical meaning of AI token debt.

The Measurement Problem

Most technical debt metrics were built for human maintainability. They count issues, complexity, duplication, vulnerable dependencies, missing tests, or style problems.

Those signals still matter. But AI-assisted development adds another question:

How much extra context does this repository force every future agent and engineer to reconstruct?

That question cannot be answered by lines of code alone.

A 40,000-line codebase with clean ownership, strong tests, explicit boundaries, and clear naming may be cheaper for an agent to work inside than a 7,000-line codebase full of duplicated policies, weak tests, and cross-domain side effects.

The cost is not size. The cost is inference.

Signal 1: Context Sprawl

Context sprawl appears when one change requires the agent to inspect unrelated parts of the system.

Example:

// checkout/complete-order.js
import { updateInventory } from "../warehouse/inventory.js";
import { createInvoice } from "../billing/invoices.js";
import { sendCampaignEmail } from "../marketing/campaigns.js";
import { syncCustomerProfile } from "../crm/sync.js";

export async function completeOrder(order) {
  await updateInventory(order.items);
  await createInvoice(order.customerId, order.total);
  await sendCampaignEmail(order.customerEmail, "order-complete");
  await syncCustomerProfile(order.customerId);
}

This code may work. But it collapses warehouse, billing, marketing, and CRM behavior into one workflow. If an agent is asked to adjust the email behavior, it still has to reason about inventory, billing, and CRM side effects because they share the same execution boundary.

A cleaner interface lowers future context cost:

export async function completeOrder(order, services) {
  await services.inventory.reserve(order.items);
  await services.billing.createInvoice(order.customerId, order.total);
  await services.notifications.orderCompleted(order.customerEmail);
  await services.customerProfile.recordOrder(order.customerId);
}

The second version does not magically solve architecture. But it makes dependencies visible. That matters because visible boundaries reduce search and inference.

Signal 2: Duplicated Policy Logic

Duplicated business rules are expensive for AI agents because the agent has to decide whether two similar blocks represent the same policy, a legacy branch, an intentional override, or an accidental copy.

// billing/discounts.js
export function applyDiscount(customer, amount) {
  if (customer.plan === "enterprise" && customer.monthsActive > 12) {
    return amount * 0.85;
  }
  return amount;
}

// checkout/pricing.js
export function calculateFinalPrice(user, subtotal) {
  if (user.accountType === "enterprise" && user.monthsActive > 12) {
    return subtotal * 0.85;
  }
  return subtotal;
}

The debt is not only duplication. The debt is semantic ambiguity.

An agent has to ask:

Are customer.plan and user.accountType the same concept?
Which path is authoritative?
Should both files be updated?
Are there production paths that still use the older version?
What test proves the correct behavior?

The remediation should create one policy boundary:

export function enterpriseDiscountRate(account) {
  if (account.type === "enterprise" && account.monthsActive > 12) {
    return 0.15;
  }
  return 0;
}

The goal is not elegance. The goal is to remove the need for future agents to infer which policy is real.

Signal 3: Weak Executable Context

Tests are not only quality gates. For AI-assisted engineering, strong tests are executable context.

A weak test tells an agent very little:

test("creates invoice", async () => {
  const invoice = await createInvoice(customerId);
  expect(invoice.status).toBe("created");
});

A stronger test explains the system contract:

test("does not create duplicate invoices for the same idempotency key", async () => {
  const first = await createInvoice(customerId, { idempotencyKey: "order-123" });
  const second = await createInvoice(customerId, { idempotencyKey: "order-123" });

  expect(second.id).toBe(first.id);
  expect(await invoiceRepository.countForCustomer(customerId)).toBe(1);
});

This reduces token debt because the agent no longer has to infer the failure behavior from implementation details. The test states the contract.

A Practical AI Token Debt Scorecard

A useful report should estimate AI token debt from structural signals:

Signal	Why it increases AI-agent cost	What reduces it
High fan-in modules	Many callers must be considered before a change is safe	Split ownership, interfaces, targeted tests
Duplicated policy logic	Agents must infer which rule is authoritative	Single policy module, migration tests
Broad orchestration files	One edit drags in multiple domains	Explicit service interfaces
Weak failure tests	Agents guess behavior under stress	Executable context for edge cases
Unexplained generated code	Future agents reverse-engineer intent	Explanation coverage and review notes
Review churn hotspots	Humans already disagree about meaning	Ownership, design notes, smaller modules

This kind of scorecard is more useful than a raw issue count because it explains why future work will cost more.

The Business Interpretation

Technical debt has always charged interest through slower delivery and higher risk.

AI changes the interest mechanism.

The interest now appears as:

larger prompts
more repository search
more failed patches
more manual validation
more review cycles
more uncertainty around generated code

That means technical debt is becoming part of AI governance. If leadership is investing in AI coding tools, they should also be measuring whether the codebase is becoming easier or harder for agents to reason about.

What A Good Report Should Produce

A useful AI-era technical debt report should include:

Exact source evidence.
The debt category.
The operational impact.
The AI-agent cost driver.
The smallest practical remediation.
The tests or proof required after cleanup.
A priority order.

The goal is not to shame the codebase.

The goal is to make the next change cheaper.

That is the real value of reducing AI token debt.

The Token Tax of Technical Debt

Clear Code Intelligence — Fri, 12 Jun 2026 10:46:45 +0000

AI coding does not make technical debt disappear.

It changes the way technical debt charges interest.

Before AI-assisted delivery, the cost of technical debt showed up as slow onboarding, fragile releases, confusing ownership, duplicated work, and long debugging sessions. Those costs still exist. But there is now another layer: every AI agent that touches a messy repository has to spend more context, more tool calls, more retries, and more validation effort just to understand what the system is supposed to do.

That is the token tax of technical debt.

The model provider does not charge a separate "technical debt fee." The bill shows up indirectly. More unclear code means more prompt context. More brittle boundaries mean more code search. More missing tests mean more explanation and manual verification. More duplicated logic means more repeated reasoning.

For engineering leaders, this matters because AI-assisted software delivery is not only a productivity conversation. It is becoming an operating-cost and governance conversation.

Where The Token Tax Comes From

The most expensive codebases for AI agents are not always the largest codebases.

The expensive codebases are the ones where the agent cannot cheaply answer basic questions:

Where is the real source of truth?
Which module owns this behavior?
What tests prove the failure mode?
Which dependency is allowed to call which boundary?
Is this duplicated intentionally or accidentally?
What is safe to change without creating a regression?

If the repository cannot answer those questions clearly, the agent has to infer them. Inference burns context.

Example 1: Duplicated Business Logic

Duplicated logic is not only a maintenance problem. It is an AI-context problem.

// billing/discounts.js
export function applyDiscount(customer, amount) {
  if (customer.plan === "enterprise" && customer.monthsActive > 12) {
    return amount * 0.85;
  }

  if (customer.plan === "startup" && amount > 500) {
    return amount * 0.9;
  }

  return amount;
}

// checkout/pricing.js
export function calculateFinalPrice(user, subtotal) {
  if (user.accountType === "enterprise" && user.monthsActive > 12) {
    return subtotal * 0.85;
  }

  if (user.accountType === "startup" && subtotal > 500) {
    return subtotal * 0.9;
  }

  return subtotal;
}

A human reviewer sees the problem quickly: the same pricing rule is split across two modules with different naming.

An AI agent has to ask more questions:

Are customer.plan and user.accountType the same concept?
Which implementation is authoritative?
If the discount changes, should both files change?
Is one path legacy?
Are there tests proving both paths?

That uncertainty turns a simple change into a wider repository search.

The remediation is not just "remove duplication." A useful technical debt finding should recommend a safer path:

// pricing/discount-policy.js
export function calculateDiscountRate(account) {
  if (account.type === "enterprise" && account.monthsActive > 12) {
    return 0.15;
  }

  if (account.type === "startup" && account.purchaseAmount > 500) {
    return 0.1;
  }

  return 0;
}

export function applyDiscount(account, amount) {
  return amount * (1 - calculateDiscountRate({
    type: account.type,
    monthsActive: account.monthsActive,
    purchaseAmount: amount
  }));
}

The better version creates a single policy boundary. It gives humans and agents one place to reason from.

Example 2: Missing Failure Behavior

Weak tests also create token tax.

test("creates an invoice", async () => {
  const invoice = await createInvoice(customerId);
  expect(invoice.status).toBe("created");
});

This test proves the happy path. It does not explain what happens when payment authorization fails, when the customer is missing, when the billing provider times out, or when idempotency is required.

An AI agent asked to modify billing behavior now has to inspect implementation details, dependencies, logs, and call sites to infer the missing contract.

A stronger test suite reduces future reasoning cost:

test("does not create duplicate invoices for the same idempotency key", async () => {
  const first = await createInvoice(customerId, { idempotencyKey: "order-123" });
  const second = await createInvoice(customerId, { idempotencyKey: "order-123" });

  expect(second.id).toBe(first.id);
  expect(await invoiceRepository.countForCustomer(customerId)).toBe(1);
});

test("marks invoice as payment_pending when authorization times out", async () => {
  paymentGateway.authorize.mockRejectedValue(new TimeoutError());

  const invoice = await createInvoice(customerId);

  expect(invoice.status).toBe("payment_pending");
  expect(invoice.retryAfter).toBeDefined();
});

These tests are not just quality gates. They are executable context.

They reduce the number of assumptions that every future engineer and every future AI agent has to make.

Example 3: Unclear Ownership Boundaries

AI agents struggle when the codebase hides architecture decisions inside informal conventions.

// order-service.js
import { updateInventory } from "../warehouse/inventory.js";
import { sendMarketingEmail } from "../marketing/campaigns.js";
import { createInvoice } from "../billing/invoices.js";
import { trackEvent } from "../analytics/events.js";

export async function completeOrder(order) {
  await updateInventory(order.items);
  await createInvoice(order.customerId, order.total);
  await sendMarketingEmail(order.customerEmail, "order-complete");
  await trackEvent("order_complete", order);
}

This might work. But it also forces every change to understand warehouse, billing, marketing, and analytics at the same time.

The token tax appears when an agent has to change one workflow and suddenly needs broad context across four domains.

A cleaner boundary makes the orchestration explicit:

export async function completeOrder(order, services) {
  await services.inventory.reserve(order.items);
  await services.billing.createInvoice(order.customerId, order.total);
  await services.notifications.orderCompleted(order.customerEmail);
  await services.analytics.orderCompleted(order.id);
}

The improvement is not just aesthetic. It makes dependencies visible. It makes tests easier to isolate. It makes ownership easier to discuss. It gives AI agents a smaller context window for future edits.

How To Measure Token-Tax Risk

A repository audit should not guess token cost from lines of code.

The better question is: which technical debt patterns force repeated context gathering?

Useful signals include:

modules with high fan-in and unclear ownership
repeated logic across unrelated folders
weak test coverage around failure behavior
broad files that mix workflow, persistence, validation, and side effects
dependencies that cross domain boundaries without an interface
generated or AI-assisted code with no explanation coverage
high review churn or repeated rewrites around the same area

These are not abstract quality complaints. They are places where future AI-assisted changes will probably need more search, more reasoning, more retries, and more human review.

What Leaders Should Ask For

If a technical debt report is going to be useful in an AI-assisted engineering environment, it should include more than a list of warnings.

It should show:

The exact code evidence.
Why the finding matters to delivery, reliability, security, or AI-assisted change.
The likely operational cost if it is ignored.
The smallest practical remediation path.
Tests or proof that should exist after cleanup.
A priority order that lets the team act.

The goal is not to shame a codebase. The goal is to make the next change cheaper, safer, and easier to explain.

The Real Point

The future of AI-assisted delivery will not be won only by teams that prompt better.

It will be won by teams whose repositories are easier to reason about.

Clean boundaries, strong tests, explicit ownership, and visible remediation plans reduce human cost. They also reduce AI-agent cost.

That is why technical debt is becoming an AI governance issue.

Clear Code Intelligence is being built around this idea: repository scans should produce evidence-backed findings, code examples, remediation order, and proof after cleanup.

If your team is adopting AI coding tools, the question is not only "how fast can we generate code?"

The harder question is: "how much context does our codebase force every future engineer and agent to relearn?"

Measuring AI-Assisted Technical Debt After the Merge

Clear Code Intelligence — Thu, 11 Jun 2026 23:32:22 +0000

Measuring AI-Assisted Technical Debt After the Merge

AI-assisted technical debt should not be measured by asking how many lines a model helped write.

That question is easy to count, but it is usually the wrong proxy. A small AI-assisted patch can create expensive operational risk if nobody can explain it, test it, own it, monitor it, or safely modify it later. A large AI-assisted change can be acceptable if the team preserves the right evidence and control points.

The better question is whether the change increases maintenance, review, incident, ownership, or remediation cost after the merge.

That means the useful metrics are not only static code metrics. They are post-merge operating metrics.

1. Review Churn

Track review cycle time and re-review count.

If AI-assisted changes repeatedly bounce through review, the team may be accepting code that is syntactically valid but hard to reason about. Review churn is often an early signal that a change lacks explanation, constraints, ownership context, or test evidence.

Useful signals:

time from pull request open to approval
number of re-review cycles
number of requested clarifications
number of review comments about intent, safety, naming, or hidden coupling

2. Rewrite Rate

Track how often AI-assisted code is rewritten within 30, 60, and 90 days.

Rewrite rate matters because technical debt is not always visible at merge time. A change may pass tests and still create a pattern that becomes expensive once the team needs to extend it.

Useful signals:

files rewritten shortly after merge
repeated edits to the same generated-heavy module
replacement of generic helpers with domain-specific abstractions
removal of duplicated logic introduced across several patches

3. Rollback and Hotfix Pressure

Track rollback, hotfix, and emergency patch rate after AI-assisted changes.

This is especially important when changes touch dependencies, auth, external APIs, browser automation, model providers, retries, cancellation, or runtime state. Those boundaries fail in ways that may not appear in basic happy-path tests.

Useful signals:

rollback rate after merge
emergency patch rate
incidents linked to provider or dependency drift
failures caused by malformed model output, timeout behavior, or partial state

4. Owner Clarity

Every generated-heavy module still needs a named owner.

The risk is not that AI helped produce the code. The risk is that nobody understands the operational intent well enough to support it. Ownership clarity matters more as teams move faster, because speed without ownership creates support drag.

Useful signals:

named owner per module or workflow
review route for future changes
escalation path for production issues
runbook or design note for critical behavior

5. Boundary Drift

AI projects accumulate debt at boundaries.

Provider integrations, tool calls, browser state, auth, retries, filesystem access, queues, external APIs, and dependency upgrades all create seams where behavior can drift. A generic code-quality score can miss this because the risky part is often the interaction, not the isolated file.

Useful signals:

new integration edges
repeated provider-specific conditionals
duplicated retry logic
missing cancellation or timeout handling
examples that become production guidance without production-grade tests

6. Failure-Mode Coverage

Happy-path tests are not enough for AI-assisted workflows.

Teams should track whether important workflows have tests for malformed model output, provider changes, dependency drift, browser failure, timeout behavior, retry exhaustion, invalid credentials, and partial state cleanup.

Useful signals:

failure-mode tests per critical workflow
smoke tests for tool/provider boundaries
regression tests for known incident paths
dependency update checks

7. Explanation Coverage

AI-assisted changes need an evidence trail.

That evidence does not need to be heavy, but it should exist. The team should be able to connect important code back to a requirement, design decision, constraint, owner, test, and verification result.

Useful signals:

ADRs or short design notes for critical changes
clear acceptance criteria
pull request explanation quality
traceability from finding to remediation proof
documented reason for any suppression or accepted risk

8. Verification Latency

Track the time between generated patch, human review, production validation, and remediation proof.

Long verification latency means the team may be moving faster than its ability to prove safety. That is where debt compounds: not in the code alone, but in the gap between change and confidence.

Useful signals:

time from generated patch to review
time from review to test proof
time from deployment to validation
time from finding to verified remediation

The Practical Audit Question

The risk is not AI-assisted code.

The risk is code the team cannot explain, test, own, monitor, and safely change later.

A useful technical debt report should therefore do more than list findings. It should translate findings into operating metrics that technical leaders can track after remediation:

Did review churn go down?
Did rewrite rate go down?
Did rollback pressure go down?
Did owner clarity improve?
Did boundary drift become visible?
Did failure-mode coverage improve?
Did explanation coverage improve?
Did verification latency shrink?

That is the difference between a scanner output and a debt reduction system.

Technical Debt in AI Agent Repositories Lives at the Boundaries

Clear Code Intelligence — Thu, 11 Jun 2026 18:04:00 +0000

AI agent repositories create a different technical debt profile than traditional CRUD applications.

In a standard web app, debt often shows up as large files, missing tests, unclear ownership, duplicated logic, stale dependencies, weak security controls, or architecture that makes changes expensive. Those still matter in AI repositories, but they are not the whole story.

The most expensive debt often lives at the boundaries.

Provider boundaries

An agent project may integrate OpenAI, Anthropic, local models, browser automation, vector stores, external APIs, auth systems, and file access. If those boundaries are not explicit, every new provider or workflow increases the chance of hidden coupling.

Runtime boundaries

Agents execute plans, call tools, retry failed steps, touch browsers, parse model output, and handle partial state. A weak runtime boundary makes it hard to answer basic operational questions: what ran, what failed, what was retried, and what state was left behind?

Example boundaries

In fast-moving AI projects, examples become production guidance. If examples are not tested, versioned, and kept close to real usage, they become a source of silent debt.

Dependency boundaries

AI SDKs and provider packages move quickly. Stale dependency policy can become a production risk because provider behavior changes, API contracts move, and security updates arrive quickly.

Observability boundaries

When an agent workflow fails, the team needs to know whether the failure came from the model, the prompt, the tool, the browser, an external service, a dependency, or application code. Without that traceability, remediation turns into guessing.

A useful technical debt audit for AI repositories should therefore include more than lint findings. It should connect source-level evidence, dependency signals, runtime risk, examples, documentation, ownership, CI proof, and a remediation path.

The score is not the product. The evidence trail is the product.

The best reports help teams decide:

what needs to be fixed now
what can be monitored
what is an accepted tradeoff
what needs owner approval
what needs verification after cleanup

That is the difference between a noisy scan and a report that can drive engineering action.

What a Useful Technical Debt Finding Should Contain

Clear Code Intelligence — Thu, 11 Jun 2026 16:21:22 +0000

Most technical debt reports fail for a simple reason: they list concerns, but they do not create decisions.

A useful finding should help an engineer understand the risk and help a leader understand whether the fix deserves time. That requires more than a severity label.

Here is a practical structure for a technical debt finding that can move from report to remediation.

1. Stable Identity

Every finding needs an identity that survives small code changes.

At minimum:

rule ID;
fingerprint;
file path;
line range;
first seen date;
last seen date;
current state.

This is what prevents the same issue from being rediscovered every week as if it were new.

2. Source Evidence

The finding should show why it exists.

Weak version:

This module is complex.

Useful version:

src/billing/webhooks.ts contains provider parsing, event validation,
subscription mutation, and notification emission in one route handler.
The function has multiple provider-specific branches and no nearby tests
covering duplicate delivery or stale event timestamps.

Evidence turns a warning into a conversation the team can verify.

3. Risk Explanation

The report should explain why the finding matters.

Risk can come from:

production exposure;
change frequency;
unclear ownership;
sensitive data;
revenue path;
dependency age;
weak tests;
architectural coupling.

Complexity alone is not always urgent. Complexity in a high-change billing path is different.

4. Confidence Level

Not every finding deserves the same trust.

Confidence should be explicit:

high confidence: direct evidence and clear remediation path;
medium confidence: strong signal but needs team validation;
low confidence: possible concern, included for review.

This helps teams avoid scanner fatigue.

5. State

The finding should not live forever as simply "open."

Better states:

active debt;
accepted risk;
false positive;
suppressed with reason;
generated or vendor exclusion;
remediated;
needs verification.

This distinction is especially important for leadership reporting. Accepted risk should be visible, but it should not be mixed with unmanaged debt.

6. Remediation Guidance

The finding should describe the smallest practical path to reduce risk.

Example:

Extract webhook payload normalization into a pure function.
Add contract fixtures for duplicate delivery, missing customer IDs,
and stale timestamps. Move state mutation behind an idempotent service.
Add one regression test proving duplicate provider events cannot emit
duplicate billing events.

That is more useful than "refactor this."

7. Verification Path

A debt finding is not complete until the team knows how to prove it improved.

Verification might include:

added tests;
reduced dependency exposure;
removed duplicated logic;
smaller critical function surface;
new CI rule;
Semgrep rule pass;
dependency audit pass;
documented accepted risk.

The proof step is what keeps technical debt cleanup from becoming subjective.

The Standard

A strong finding should let someone answer:

Where is the evidence?
Why does it matter?
Who owns it?
What should be done first?
What proves the fix worked?

That is the difference between a scanner warning and an actionable technical debt audit.

Technical Debt Audits Need Evidence, Not Vibes

Clear Code Intelligence — Thu, 11 Jun 2026 15:46:46 +0000

Technical debt is not simply messy code. It is the gap between the system a team has and the system the business now needs.

That distinction matters because many teams treat debt like an aesthetic problem. They point at old files, large functions, missing tests, inconsistent patterns, or dependency warnings and say the codebase is unhealthy. Those signals may be true, but they are not enough to guide investment.

A useful technical debt audit should answer sharper questions:

What evidence exists in the repository?
What business or delivery risk does the evidence imply?
Which findings are active debt, and which are consciously accepted tradeoffs?
What remediation path is practical?
How will the team prove the debt was reduced after the fix?

Without that structure, a scan becomes another noisy dashboard. The engineering leader still has to translate warnings into decisions.

Debt Becomes Useful When It Has Evidence

A finding should point back to the code, configuration, dependency graph, test surface, or build behavior that caused it. If a report cannot show where the concern comes from, it is difficult for a team to trust the recommendation.

For example, these are very different findings:

The codebase has weak test coverage.

The payment webhook handler parses provider payloads, mutates subscription state,
and emits billing events without nearby unit or integration tests. The module is
changed frequently and has no contract tests around duplicate webhook delivery.

The second version is actionable. It names the affected area, explains the risk, and gives the team a starting point for remediation.

Severity Needs Business Context

Static analysis can detect many issues, but priority still depends on context. A duplicated helper in an internal admin screen is rarely equivalent to duplicated authorization logic in an API path.

A good audit separates technical signals from decision signals:

Technical signal: duplicated branching, stale dependency, missing validation, broad exception handling, no test fixture, circular import, unchecked user input.
Decision signal: production exposure, change frequency, revenue path, data sensitivity, onboarding friction, incident history, release bottleneck.

The goal is not to create a longer issue list. The goal is to tell a team what should be fixed first and why.

Active Debt vs Accepted Tradeoff

Not every imperfection is debt that should be paid immediately. Sometimes a shortcut is deliberate, documented, and bounded. That is an accepted tradeoff.

Active technical debt is different. It keeps charging interest:

Engineers avoid a module because changes are slow or risky.
AI-generated code multiplies inconsistent patterns.
Warnings are ignored because the scanner has no ownership model.
Dependencies drift because upgrade risk is unclear.
Tests exist, but not around the parts that actually fail in production.

When an audit labels everything as urgent, nothing is urgent. Strong reports make accepted tradeoffs explicit and keep active debt visible.

Remediation Should Be Specific

The best technical debt recommendations are not vague instructions like "refactor this module" or "add tests." They describe the smallest useful path to reduce risk.

Example remediation plan:

1. Extract payload normalization from the webhook route into a pure function.
2. Add contract fixtures for duplicate delivery, missing customer IDs, and stale event timestamps.
3. Move state mutation behind an idempotent subscription service.
4. Add a regression test that proves the same provider event cannot produce duplicate billing events.

That is the difference between a report that educates and a report that creates more work.

Proof Matters After the Fix

Technical debt reduction should produce evidence too. After remediation, a team should be able to show what changed:

risky files became smaller or less coupled;
unsupported dependencies were upgraded or removed;
critical paths gained tests;
repeated patterns were consolidated;
scanner warnings dropped without suppressing real issues;
build, lint, or review gates now prevent regression.

This proof loop is especially important as teams adopt AI coding tools. AI can accelerate delivery, but it can also accelerate inconsistency. The answer is not to reject AI-generated code. The answer is to improve evidence, review discipline, and remediation feedback loops.

A Better Audit Model

An effective repository audit should include:

an executive summary for engineering leadership;
a prioritized debt register;
source-level evidence for each major finding;
representative code snippets;
remediation options with expected impact;
dependency and security hygiene;
test and CI coverage gaps;
architecture and ownership risks;
proof criteria for post-remediation verification.

Technical debt audits should help teams decide, not just detect.

That is the standard Clear Code Intelligence is building toward: repository scans that turn technical debt into evidence, priority, remediation, and proof.