CD: What It Really Tells You (and What It Doesn’t) – Bubble

There’s a certain comfort in seeing a pull request pass all checks. Green builds, no failing tests, everything looks solid.

But here’s the uncomfortable question: what exactly did your tests cover?

That’s where code coverage enters the conversation—not as a vanity metric, but as a signal. When used correctly, it helps you understand not just whether your code works, but how much of it you’ve actually verified.

Coverage Is a Map, Not a Score

At its core, code coverage measures how much of your codebase is executed when your test suite runs. The most common types include:

Line coverage – which lines were executed

– which lines were executed Branch coverage – which logical paths (if/else, switch cases) were taken

– which logical paths (if/else, switch cases) were taken Function/method coverage – which functions were invoked

Most tools default to line coverage because it’s easy to compute and understand. But it can also be misleading.

A test that executes a line doesn’t necessarily validate its correctness. You can hit 100% line coverage and still miss critical bugs—especially around edge cases and branching logic.

So instead of treating coverage as a goal, it’s more useful to treat it as a map of untested territory.

Why Coverage Matters in CI/CD

CI/CD pipelines are about confidence. Every commit that flows through your pipeline is a potential production deployment. Coverage acts as a guardrail—not by guaranteeing correctness, but by highlighting risk.

When integrated into CI/CD, coverage helps you:

Detect untested code introduced in a pull request

Prevent silent degradation of test quality over time

Encourage consistent testing practices across contributors

More importantly, it creates accountability. Without coverage checks, it’s easy for teams to gradually stop writing meaningful tests—especially under delivery pressure.

The Baseline Question: How Much Is Enough?

This is where things get opinionated.

You’ll often hear numbers like 70%, 80%, or 90% thrown around as “good coverage.” In reality, the right number depends on your domain:

A prototype or internal tool might tolerate lower coverage

A financial or healthcare system should aim much higher

Legacy systems often start low and improve incrementally

What matters more than the number itself is consistency and trend.

A stable 75% with thoughtful tests is far more valuable than a forced 90% filled with shallow assertions.

That said, many teams adopt practical thresholds:

80% line coverage as a general baseline

as a general baseline Higher thresholds (85–90%) for critical modules

for critical modules Lower thresholds temporarily when dealing with legacy code

The key is to treat thresholds as minimum quality gates, not targets to game.

Enforcing Coverage in a Pipeline

Modern CI systems make it straightforward to enforce coverage, but the implementation details matter.

A typical flow looks like this:

Run tests with coverage enabled Generate a coverage report (e.g., XML, HTML, or JSON) Compare results against a defined threshold Fail the pipeline if the threshold is not met

For example, in a PHP/Laravel setup using PHPUnit:

test: script: - php artisan test --coverage --min=80

Or with more control using PHPUnit directly:

phpunit --coverage-clover=coverage.xml

Then you can enforce thresholds either via PHPUnit configuration:

Or by using external tools like SonarQube, which allow you to define quality gates that fail builds when coverage drops below a certain percentage.

The “Diff Coverage” Approach

One of the more effective strategies—especially in mature teams—is diff coverage.

Instead of enforcing coverage across the entire codebase, you enforce it only on new or changed code.

This solves a common problem: legacy codebases with low coverage. Rather than blocking progress, you ensure that every new line added is properly tested.

Tools like diff-cover , GitLab’s built-in coverage visualization, or SonarQube can help implement this approach.

It’s a small shift, but it changes team behavior significantly.

Where Coverage Falls Short

Coverage is often misunderstood because it’s easy to measure.

But what it doesn’t tell you is just as important:

It doesn’t guarantee meaningful assertions

It doesn’t ensure edge cases are handled

It doesn’t validate business logic correctness

A test that calls a method and asserts true === true will still increase coverage.

That’s why high-performing teams combine coverage with:

Code reviews focused on test quality

Mutation testing (to verify test effectiveness)

Static analysis and type checking

Coverage is a signal—but it needs context.

Making It Work in Real Teams

The most successful use of coverage in CI/CD isn’t strict enforcement—it’s gradual alignment.

Start by measuring. Then visualize. Then enforce lightly.

Over time, raise expectations as the team adapts.

A good pattern looks like this:

Introduce coverage reporting without enforcement

Add a soft threshold (warnings, not failures)

Transition to hard thresholds for new code

Gradually raise the bar where it makes sense

This avoids the common trap of teams gaming the system just to pass builds.

The Real Value

Code coverage isn’t about hitting a number. It’s about reducing uncertainty.

When a deployment goes out, you want to know that the critical paths—the things that matter most—have been exercised, validated, and protected against regression.

Coverage won’t tell you everything.

But without it, you’re flying blind.