< expLog

DevTools > Meaningful Measurements

tl;dr; Don't use metrics as a crutch to avoid thinking and understanding.

Metrics are often considered the only way to observe and validate success, making them incredibly important to get right. I have a love-hate relationship with metrics: I find them useful as an proxy towards estimating success, but metrics treated as an end in and of themselves lead to pathological behavior.

So far into my career, I still prefer meaningful qualitative explanations to multiple abstract and obscure metrics: one of these implies proper understanding, the other one not so much.

Metrics are a map, not the territory

Before you invest a lot in metrics – particularly those aimed at measuring developer productivity – spend some time reading about state-of-the-art on the subject. You'll find that's it's tough to get clear answers around what's better – and you probably don't have the time to research the best ways to evaluate engineering productivity.

Metrics, in general, are a reduction of the world into something simpler that is easier to grasp, track and coordinate on. Never assume that instrumentation represents the world entirely – nor incentivize others to do so.

Retention metrics measure the problem and not the solution

Before you set goals around increasing your tool's retention metrics, check to see any alternative approaches.

Your coworkers are generally highly motivated, incentivized people. If there's only one way to achieve their goals – and that's through your tool – then they'll suffer through it regardless of quality.

Accordingly, use high retention as a good sign that the problem is essential to the business; and generally doesn't reflect on the quality of the tool itself.

Don't rely on usage statistics to dismiss solutions

Conversely, if people aren't currently using your tools, it can be for two reasons: either it's not part of their daily workflow, or the tool in question is not particularly well built. But it could still be essential when it's needed.

It's worth looking into the lack of use carefully; often enough, low usage statistics hide valuable opportunities, especially if there are no convincing qualitative explanations.

Set goals on business outcomes instead of tools

The argument the book How to Measure Anything makes against unmeasurable work is that there is always some change in the world you want to achieve. Measure that as your metric instead of the otherwise intangible or hard to quantify changes you're making.

Some positives: it'll also keep you and your team focused on the actual problem instead of treating the tools as ends, and moving metrics is guaranteed to translate into meaningful victories. That's not true at all for tool– or execution– specific metrics.

On the flip side, attribution towards "how much value your specific tool added" can be much more challenging. Hopefully, people don't care too much about this, but if they do (and you can't convince them otherwise) then you could consider having some additional proxy metrics as an explicitly extremely tentative measure of progress, with caveats.

Complex metrics cost more than they're worth

Simple, easy-to-explain metrics are much more likely to point you in the right direction. Complex models are hard to validate, understand and execute on – and often end up costing a tremendous amount of energy (and several data scientists) to maintain and understand.

You're also unlikely to find out quickly if the metric breaks, which often has disastrous consequences: not unlike navigating with a broken compass. More complex metrics can also make it significantly more challenging to do any statistical analysis to confirm whether changes are significant or simply noise.

If you can't set goals on business metrics (previous point) for whatever reason, keep your tool metrics extremely obvious and direct. Some examples of simple numbers which are easy to compare and reason about include time spent waiting for a computer, release cadence, and crashes in production.

Do a sensitivity test

Be sure that your metrics catch actual changes in behavior. Test out a couple of scenarios you want to be sure your metrics capture by generating fake numbers corresponding to those situations and checking that they show up (and don't get lost in the noise).

For example, a metric that only shows significant changes if 50%+ of your users can't use the tool isn't very valuable: you'll have far louder and quicker signals (such as a mob carrying pitchforks complaining about the broken tool).

You might find value in looking at higher percentiles, outliers, and potentially playing with an HDR Histogram.

Always have qualitative explanations

Most importantly – you should always be able to explain – in words – why your metrics moved in a particular direction. Ideally, you can triangulate an approximation of the change from secondary sources as well.

If you can't explain the changes, you should stop trusting the metric and start digging into the instrumentation. Never celebrate metric-only victories that don't have a meaningful explanation. This is one of those rare times I feel comfortable giving unqualified advice.

Do investigate metric-only losses carefully: either your tools broke, your metric is faulty, your understanding of your metric is incorrect, or your instrumentation is buggy. Try to triangulate your observations with multiple sources of data.


Add your comments to this Twitter thread, or drop me an email.