Why Use the DORA Metrics?

8 Minutes
0 Comments

Close up of people's hands reviewing a document containing various charts and graphs on a boardroom table

By Chris Janes 07/18/2022

Views: 662
Likes: 0

What are the DORA Metrics?

Measuring the performance of software delivery teams is notoriously difficult. Some standard metrics, such as velocity, still have usefulness but are not a measure of overall performance. Other measures, such as code churn rate or the number of lines of code written, are not indicative of performance and probably not even good metrics for any other purpose. Of course, many metrics can be tracked and evaluated over time, but the DORA metrics are valuable in measuring how well a team delivers value.

Software delivery teams use the DORA metrics to measure their performance by evaluating criteria related to throughput and stability. The four metrics tracked are; deployment frequency, lead time for changes, mean time to recovery and change failure rate. When combined, the results allow teams to be categorized into a range of “low” to “elite” performers using a balance between speed and quality. Instead of focusing on the output of a team, these metrics evaluate tangible outcomes.

The DevOps Research & Assessment (DORA) team has researched and surveyed thousands of companies, groups and individuals over the past decade and compiles a State of DevOps report yearly. See https://www.devops-research.com/research.html for more information. They conclude that a DevOps culture that is firmly understood and embraced positively impacts organizational performance and development culture while improving software delivery. Therefore, we should consider any framework that can increase profitability and retain happier employees with less burnout. There is excellent content within these reports to help drive the transformation of any organization, but for this article, we’ll focus on a software delivery team’s most effective performance measures.

Outcomes, not outputs

Velocity is a common statistic to track for a team, but this number is only helpful for determining capacity planning. Once a velocity emerges over several iterations, teams should only use it to determine the amount of work to plan for the next iteration. It is not a metric used to determine how well a team performs, let alone a fair comparison to other teams, since velocity is always a relative measure specific to an individual group. For example, one with a velocity of 53 could be less effective than another team with only 18. Don’t be tempted to track a velocity trend over time, either. If a team knows it is measured based on an increase in velocity over time, they could inflate their estimates to look good.

Total lines of code written is another measure that does not indicate how a team is doing. Firstly, refactoring existing code could be an invaluable improvement to the codebase and needs to be encouraged for maintainability and technical debt decrease. Secondly, measuring the total lines of code added would also encourage developers to extend their code when not necessary to get a better performance score.

These measures, and others like them, that focus on the team’s output do not give any insight into how the team’s work impacts the organization. Instead, let’s focus on how a team can positively influence the products or services offered. In that case, we will enforce the correlation between the group’s efforts and the outcomes that benefit the company. Let’s look at the specific DORA metrics and evaluate how they can achieve this.

Deployment Frequency

When measuring deployment frequency, we determine how often a team can deliver change to the end user. Of course, any time we introduce a change, it comes with risk. However, the risk tends to be small when the difference is negligible. Therefore, if a team is delivering frequent code deployments, it implies each deployment is small in size, which mitigates the risk.

It is common for organizations to be hesitant about changes to production systems which usually comes with many approval gates to get deployments into the hand of users. If an organization has a culture of deploying once a month (for example), it would mean lumping together many features/fixes. When something goes wrong with one of those changes, it typically becomes more difficult to troubleshoot where the problem is, and in the worst case, rolling back those changes would also remove the features that were working correctly. On the other hand, if each deployment had fewer changes and occurred every couple of days (for example), troubleshooting now becomes much quicker, and rollbacks are less impactful.

Lead Time for Changes

This metric measures the time it takes for code to make its way into the hands of end users. Typically this duration starts on the first commit of a branch or feature and ends with the deployment of that same branch or feature. A team that measures a low turnaround time indicates that each segmented change they produce is little. And just like deployment frequency, if each difference is limited, so is the corresponding risk.

Unfortunately, as we all have likely experienced, a code change that hangs around in any part of the development lifecycle for too long runs the risk of merge conflicts, increasing the overall development time. In addition, when the Lead Time for Change metric is high, it could indicate the team needs to break down feature work into smaller pieces, a common problem for some groups or individuals. If that isn’t feasible, techniques like feature flags can bring this metric down. In any case, merging changes from all developers as often as possible is a hallmark requirement for continuous integration. Finding ways to lower this duration will directly support a higher deployment frequency.

Deployment Frequency and Lead Time for Changes measure a team’s throughput. When an organization inevitably asks how teams can deliver faster, quality is almost always the first corner they try to cut. However, quality does not have to suffer from increased speed, nor should it. The following two metrics bring system stability into the picture. While bugs are going to occur in any system, failures should not.

Change Failure Rate

The first step in measuring the Change Failure Rate for a software team is to define what a failure means. For example, a minor bug that impacts a few users or doesn’t affect their ability to use the system should not be considered a failure. This definition will vary between organizations and helps specify the comfortable risk level. Revisit this definition periodically when the acceptable risk level for the organization shifts over time.

Once we know what constitutes a failure, we can measure how often a change to the production environment results in one. If, for example, we make six deployments to production in a week, but three produce a system failure, a 50% Change Failure Rate will indicate that the high cadence of deployment frequency is at the expense of quality. High-performing teams will have a low failure rate of less than 5%. There is no benefit to frequently deploying if a high number negatively affects the end users.

Mean Time to Restore

While nobody wants to see the system fail, it does happen from time to time. The Mean Time to Restore measures how quickly a team can address the problem. Of course, downtime is not something any organization wants, so they’ll always desire a fast resolution time. However, when teams can expeditiously and repeatedly identify the cause of the problem and turn around a solution, it speaks volumes about their ownership of the system.

A strong developer culture will enable individuals to rally together to restore service. If they take pride in their work, they’ll want it up and running as soon as possible. For them to be effective, they’ll need access to the logs, data, etc., for troubleshooting, so if they don’t have the information they need, there could be more the organization can do to facilitate them. If the Mean Time to Restore is high, it might not just be a problem with an individual team. Digging in-depth might yield deeper issues like poor developer culture or lack of enablement from the organization.

Measuring the Balance

So many aspects of Software Development are about finding the right balance, and the DORA Metrics do this well. While we should desire quick delivery of features and improvements, it won’t be impactful if the system fails regularly. Conversely, if we wait until everything is perfect, we’ll always be playing catch up to the competition and delaying the critical feedback from our users.

The DORA metrics will keep an eye on this balance for you. When a team is measuring well, it indicates they are quickly deploying stable changes to the system. However, if any measure is off, it provides a chance to dig deeper with the team to examine what opportunities exist to improve their work. Finally, all the measurements are well below what high-performing teams are accomplishing. In that case, it is time to examine how the organization approaches software delivery. Don’t be too quick to point fingers – there is much more you can also do to enable the teams to get them to deliver at the cadence you need.

Written by

Chris Janes

I am currently privileged to be holding a Software Development Manager role at a fast-growing fintech startup. My career in software development has spanned over 20 years and has included experience from tiny dev shops to large corporate IT departments. I enjoy growing individuals, teams and processes to which I take a slow and steady approach. Small, incremental habit-forming changes are the ones most likely to stick.

What Static Analysis Can’t Tell You...

Static Analysis vs. Software Testing: Exploring...

The differences between static analysis and...

The roles and industries that utilize...

24 popular static analysis tools available...

What are the limitations of static...

Why is static code analysis important?

Why Use the DORA Metrics?