Skip to content

Commit

Permalink
Add third testing blog post
Browse files Browse the repository at this point in the history
  • Loading branch information
sberrevoets committed Nov 20, 2024
1 parent f68659e commit 6a9db10
Show file tree
Hide file tree
Showing 2 changed files with 158 additions and 0 deletions.
158 changes: 158 additions & 0 deletions content/2024-11-20-testing-code-coverage.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,158 @@
Title: Shifting the testing culture: Code coverage
Date: 2024-11-22
Summary: How we collect, track, and use code coverage data to improve our testing culture

!!! note
This is the second post in a series on shifting our testing culture.

1. [Motivation]
2. [Infrastructure]
3. Code coverage

[Motivation]: {filename}2024-11-18-testing-motivation.md
[Infrastructure]: {filename}2024-11-19-testing-infrastructure.md

Most people will say that code coverage is a measure of how well-tested your
code is. However, it's better to say it's a measure of how _untested_ a codebase
is. For example:

```swift
func myFunction(input: Int) -> String {
if input == 0 {
return "zero"
} else if input == 1 {
return "one"
} else {
return "other"
}
}

func testMyunction() {
_ = myFunction(input: 1)
_ = myFunction(input: 2)
}
```

This yields 66.7% code coverage, but although the test passes, it doesn't
validate any business logic. The example is a bit derived, but in a more
realistic scenario where tools report you have 66.7% code coverage, you don't
actually know if that code is actively tested or just happens to be executed
while running tests. What you do know is that the other 33.3% is _definitely
not_ tested.

## `llvm-cov`

Pedantries aside, code coverage is still important even if the coverage level is
already high and there isn't much untested code. That's because code coverage
isn't just a number, it also comes with a detailed **source overlay**: the
source code with highlights that indicate which lines, functions, and code
branches were invoked while running tests, and how often:

![Source overlay as generated by llvm-cov](/images/source-overlay.png)

It's clear that although this method has a test, the `input == 0` branch wasn't
executed while running them. The `return "one"` and `return "other"` lines are
executed once, and the rest of the method is executed twice. Getting full
coverage of this function requires adding a test that calls it with `input ==
0`.

On iOS, Xcode has built-in support for reporting code coverage. It presumably
uses [llvm-cov] for user friendliness, but invoking `llvm-cov` directly gives a
ton of extra options, including exporting in different formats like JSON and
HTML.

Since `llvm-cov` operates on a test bundle and we have a test bundle for each of
our modules, we can get detailed information about a module's code coverage:
which file and which lines within a file, how many total lines are testable and
how many are covered, etc.

In the past we used the various output formats to integrate with [codecov], but
eventually moved to an in-house web portal where this data is stored and easily
accessible. We've also built tooling integration so developers can run a simple
command on their own machine to get `llvm-cov`'s output generated for their
modules, though viewing it in Xcode is often the better choice for local
development. The only part that's missing is seeing this source overlays in
GitHub so you can immediately tell which code changes require more tests.

[codecov]: https://codecov.io
[llvm-cov]: https://llvm.org/docs/CommandGuide/llvm-cov.html

## Tracking coverage

Every time a module's source code or tests change, we capture the new coverage
metrics in a database to keep track of a full history of each module's code
coverage. We aggregate the data by team and for the codebase as a whole, and
display it on the web portal. This gives anyone immediate access to answer
common questions about their team's testing progress.

developers are often interested in more than just the raw code coverage numbers -
they also want to understand what code isn't tested so they can add more
coverage. To make this easier, the web portal also displays the source overlays
for all source files in a module for easy viewing without having to open Xcode
for it.

## Minimum coverage

> _When a measure becomes a target, it ceases to be a good measure_
>
> \- Goodhart's law
Many companies enforce a minimum coverage level all code has to hit. Although it
sounds enticing, it could also turn into exactly what Goodhart's law warns
about. developers will likely work around it if they feel they need to, or write
low-quality tests just to hit the minimum. It's also not clear what number to
pick and what to base that on.

So instead we took a slightly different approach: each module gets to set _its
own minimum coverage percentage_ in the `BUILD` file, and that percentage is
then enforced by our CI system. It's a good balance between establishing a
minimum while giving developers enough control to not make it feel overbearing.

developers are free to set that minimum to any percentage they want through a
pull request, but lowering it too drastically requires a good justification and
comes with social friction. Increases are often celebrated as a job well done.

We also bump the minimum for a given module automatically if it's too low
compared to its actual coverage. A module whose minimum coverage is set to 10%
but actually has 50% coverage, would get its minimum increased to 45%. This
makes it harder to accidentally drop coverage.

Other ideas we've played around with but haven't implemented:

* not allowing decreases in minimum coverage below a certain threshold
* enabling a global minimum but setting it very low (e.g. 20%), in hopes of
having to write _some_ tests will lead to writing more
* higher minimum coverage for code that uses highly testable architectural
components (as opposed to legacy code that's harder to test)
* tech lead approval for dropping a module's minimum coverage
* requiring adding/updating tests on PRs (changing code without needing to
change tests generally means the code is not tested)

## Raising the bar

Having a long history of coverage data available has been a big factor in
raising the quality bar. During biannual planning, teams get a clear overview of
their average coverage and can quickly see which of their modules have the most
room for improvement, and attach a measurable goal to it.

We look at overall trends and average coverage levels per module type, and set
goals for them for as guidance for other teams. Nothing big, just a couple
percent points per quarter, but it makes testing a talking point. When we
introduce a new tool, e.g. snapshot tests, we can see how that affects the trend
(spoiler alert: our developers strongly prefer snapshot tests) and whether to
invest more time/effort.

Developers ask for more or better tests during code review. There has been a
noticeable uptick in code changes that just introduce a bunch more tests for
existing code. More refactors start out with ensuring code coverage is solid
or by closing any testing gaps. If an incident happens, the fix is asked to
include a test, and the post-mortem documents the code coverage of the code
that was at fault, with an action item to increase coverage if it's too low.

Just like a more testable architecture and better tooling, having insightful
code coverage data played a big role in the gradual shift of our iOS testing
culture. We started with near 0% coverage, and all the improvements over the
years have gotten us to an overall code coverage of 60% and climbing.

To me it's one of the most fascinating leaps in maturity I've seen in my time at
Lyft with a ton of learnings along the way.
Binary file added content/images/source-overlay.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 6a9db10

Please sign in to comment.