Defining & Adopting Your Delivery Metrics

Knowledge / Inspiration

Defining & Adopting Your Delivery Metrics

Continuous Delivery
UXDX Community LatAm 2021

Building great products starts with building high performing teams. In this session Patricia will talk through that high-performing starts with clear and tangible metrics.
She will touch on how metrics should be defined, measured and adopted by the team to ensure not only product success but also happy teams.

<i>If you have any feedback for Patty, please share your thoughts through this link: https://docs.google.com/forms/d/e/1FAIpQLSci860OH8kZq0E_jxhRhSGRpUR7KHmKMadH2zBDuyi4x33YtQ/viewform</i>

Hi, everyone. Hi, people. How are you? Patricia Trejo. You can call me Patty. I'm actually the engineering director and head of software development at Cornershop by Uber. And today I'm going to talk about delivering reflow metrics in software development. So first of all, what the heck are flow metrics? And in this case, we talk that flow metrics are tools to constantly measure the flow of the development process. And that's for measuring it, to learn from it, to optimise it, to optimise the flow and thus transform our process, our development process and its results, hopefully into predictable ones. So this is our take. Take it in mind. Having in mind that one of the things that most projects have on their first days and regardless of this nature, could be waterfall projects, agile, lean, I don't know which kind of projects. Usually stakeholders start to ask these kinds of questions. For example, how long does it take to deliver? And, well, it depends. I don't have a crystal ball to see the future and to say, OK, we're going to deliver on this exact moment. Another typical question is when are we going to finish the project and the same thing? I don't know. Maybe, maybe you have some roadmap, some planning, some ground and you have some desirable times. But it's not that that is going to happen exactly right. And that's why also, instead of talking about projects, we want to talk more about products and how products evolve.

And in that way or in that sense, is that based on this flow metric information that we have from our process, we can have some metrics that will help to answer some questions. For example, how long does it take us to deliver value to our customers? That is a super simple, super simple metric, which is called the lead time. Lead time is the time that occurs between a card living in the backlog and reaching production. And you know, there is a question like how much customer value are we delivering at any given time? Is another metric, super simple metric, which is throughput and throughput is the number of cards or tasks or features delivered in a period of time. So based on these kind of metrics, flow metrics is that I want to share with you some kind of information or types of information that you can start to have in mind to make decisions based on metrics and have objective and objective data to make decisions. So I have split it into six types of information. The first one is monitoring and risk analysis, motivation and awareness, backlog analysis and prioritisation, behaviour analysis, projection and basis for execution of experiments. So I will start to talk a little bit about each one of them. The first one is the monitoring and risk analysis. So first of all, let me explain to you this graph first.

This graph shows how much time a card spends in each flow state. We take the flow through an issue tracking tool such as Trello, Jira among others. That's the source of our information. Ok. Our issue tracker, the vertical axis, shows the cards IDs and the horizontal axis shows the number of working days. That's super important, working days. It doesn't have any holidays, it doesn't have any weekends, for example, to have a clearer view related to when the team is working. So, for example, the card six nine one five was one day in analysis, one day in ready to go, three days in progress, two days in QA and three days in review. If a colour is not shown, as you can see in some of the bars for some of the cards or tasks, it is because the card spent the last time one working day in that state. Ok, so let's see an example. Here we have this graph and we can easily see that there are some strange behaviours. For example, this is the first card. Why did the cards located above here, located above, take so long to be delivered? We are talking about more than 100 working days. And the other one? Not right. So yes, this is a super weird behaviour here, and the cards that later come down have much more normal behaviours.

They have a life that that does not exceed five days in general, except for some that reach up to 20 working days, but they're super smarter than the other ones in terms of their lead time. Ok? The time that they spent from backlog that when they left backlog to get or reach production. But here is also something interesting: the cards ID on the vertical axis, or order from smallest to largest in general, which occurs assigning the cards IDs in a consecutive way. So we can then conclude that those cards are the ones that took a long time they are older because they have smaller carriages. Indeed, for this particular case, the team didn't know how to use the issue tracker well at first, and therefore they did not pass the cards from one state to another when it was. For example, for four, just for example, this unwanted behaviour was generated and otherwise unreal because the cards had already finished. And in any case, there are still processes that the team had to work on to make visible blockages due to dependencies, break the work in a smaller tasks and not in a single task. Carry out the PR reviews as soon as the development finished, among others. And that is the kind of thing that a team has to start asking themselves what is happening and how to improve themselves to make the flow first of all, real but then more efficient.

So the team is going to deliver value and the lever part of the product constantly or in an efficient way. Then here we have another case. And what do you see that you think looks strange in this graphic? Well, this one, right, we have just one day in development or in progress, and then we have a lot of days in QA. So why is it the case with these cards? Why does the QA process take so much time? What has been or what is happening? What is happening in case that the card has not been delivered? We can have these graphs. For talks that are in progress are not yet delivered, for example. So, OK, that kind of thing is related to monitoring and risk analysis because here you can see risks of what is happening along the development process, along the product, the delivery or along the team. Maybe the team is demotivated, I don't know, there could be a bunch of things that could be happening and monitoring to see if our flow is sufficient, right? And we are delivering at the right time, etc., etc. So then the second type of information is motivation and awareness. And here we have, for example, a super simple graph that shows throughput for the throughput of tasks, the throughput of cards delivered or tasks delivered each week of a year. Ok. So why does throughput vary so much from week to week?

Here the beginning was just OK, about around eight, seven carts per week. Then it started to grow and now is decreasing a little bit. Here there are a lot of features, final features for our users, final users that were the delivered, but in the later weeks we don't have too many features and chores. A technical task has grown. And here in the last week, a lot of bugs were delivered. What is happening is our development process i not being updated, for example, could be that we are not doing very well. Some technical things that are creating more bugs or the specification of our tasks is not the best, so we are creating and then delivering more bugs. Ok, so what things happen, such as weeks that have been delivered many bugs instead of features, for example. And those kinds of things are questions that you can ask yourselves to start wondering what is happening and what could be done better and what can we do as a team? This is another example. This shows a working versus waiting time. Working time is the time that effectively someone is working on a card or a task. For example, when you have your issue tracker, you have some columns like in development, in QA in review, and that means that someone is working on that card, doing the development, doing the QA, doing the review, and that's the working time.

And there are others that are waiting times or waste times. If we are talking more lean vocabulary, where are those? Are states where cards are waiting for someone to take them, for example, ready to for development, ready for review, ready for ready for deployment are those kinds of waiting stages. So here you can see for again, each one of these cards, OK, the card IDs and days working days, you can see how much time it was someone there was someone working on that task and how much time the card was waiting for someone to take it. So you can see here that there are a lot of things that were waiting a lot of time and the later ones were even much more efficient. That didn't have too much waiting time and was super little working time. And it talks about motivation and awareness because ok. Again, to see how, which things we can change. But motivation is because sometimes when we are working so hard on a task and we do not deliver that task, we get demotivated. Where is the work I have done? So if we are waiting too much to reach production, for example, that could be the motivated team and it's super important to have that kind of thing in mind. And a final example here is that again, maybe easily you can see some strange or weird behaviours here.

So why if we have cards that take too few days in development, for example, here one day, one day, one day, two days, three days when we take so long to review their pictures, they're saying, OK, so seeing the graph seems to be a behaviour that we should discuss in the team to see how we improve our development process. Maybe we are not doing it. Correct. Correct way. The good way, the PR reviews or maybe where iterating too much, because there were a lot of definitions when the card was developed, and now we are iterating between the review and the development. Or maybe the ones who are doing the reviews are on vacation, I don't know, maybe a lot of things could be a lot of things. And each team has their own context. So each team has to start asking themselves what is happening here. But usually our review thing, peer review shouldn't take longer than the days that were developed. So these are super weird behaviours. And that kind of thing also is for awareness, but also for motivation of the team. Maybe the team doesn't know very well how to do PR reviews, I don't know there could be a lot of things. So that is the kind of thing for this second type of information. And then the third one is backlog analysis and prioritisation.

And in this case, this is our new chart, a new graph I'm presenting here. This graph is called burn up. It shows in a cumulative way the delivery of cards by the team and the cumulative growth of the backlog. Ok. This is the delivery, the red curve, and the backlog is the blue one. So it is necessary that we are constantly monitoring the behaviour of the delivery by the team versus the growth of the backlog. In this graph, we have two important insights: one, on the one hand, the delivery behaviour of the team has decreased, OK? It was like, Oh, super cool. And now it's a flatter curve. And on the other hand, the backlog has started to grow a lot. Both effects mean that they will take a very long time for the team to finish the delivery. And also, let's face it, the backlog always keeps growing. So if we look at it from the perspective of a project, it is capable of never finishing. Ok, so what to do? Well, on one hand, we have to see what happens with the constant growth of the backlog. Maybe we should do some review of the backlog, some backlog grooming really fine, which releases each card belongs to the development expectations with stakeholders who will move cards. Archive cards, I don't know. On the other hand, delivery we have to drop right, right. And it's necessary to be clear about why this burn up is built only on the basis of the number of cards, for example, not points, but we can do it also with points.

Could it be that the most current cards are bigger, so they take longer, for example, could it be that it's vacation time and we have less development capacity in the team. Could it be that we have requirements that features to develop are less defined and therefore iterate a lot before it can be delivered in production? Could it be that we have reached a point where our cards are simple, not well written or defined? It could be, again, a lot of things, a bunch of different things, and that will change depending on the context of the team again. Ok, so unfortunately, there is no one size fits all answer for this because it depends. It is always going to depend on the team's context and reality. And this is our second sample here. And in this case, cards are shown that haven't been finalised yet. Ok, so the red colour in this case represents the time they have been in the backlog. So why don't we have so many cards in the backlog, for example, here till 40 working days? Maybe we should do some backlog grooming again and see if it's still supplied to have those cards in the backlog. Maybe, yes, maybe not. Maybe some have already been sold by other cards, and we must archive them.

Or it maybe is OK to have them in there in the backlog. Ok, so those kinds of things are also necessary to do constantly. So now the four types of information are related to behaviour analysis, and here I'm going to show you a new graph based on the lead time concept. Remember the concept? I told you at the beginning of this presentation that the time is that time that occurs between a card leaving the backlog and reaching production. So this graph shows how lead time behaves, how many cards have a specific time horizontal axis. Ok. For high performing teams, we will love that most cards leave for a short time. So it means that the lead time is small and OK. This is after the first four weeks of a team, and we don't have too much information or too much information, so we're not able to say anything at all at this point. But if we start tracking this behaviour around the fifth week, tenth week, fifteenth week and the 20th week, we can see that there is some behaviour. We can see that most of ours used to have little lead times, which is awesome, OK, because that means that we are delivering value fast for our product and well, also because it's similar to better distribution. But I can talk about it.

It's more a statistical thing that we can talk about in another presentation, for example, one other moment. So over time, we begin to see that there are some peaks, some, some or some peaks or local maximums that we can have in mind to do some, for example, estimations, we can analyse it and improve our process. Or maybe if we see that as some local maximums starts to grow here, for example, and a very late moment, very big lead time, maybe we should do some improvements in our process and start to ask ourselves what is happening. Maybe we are creating two big tasks and maybe we can break them down or split them into smaller cards that could bring value faster, reach out to production and bring value faster so we can. I have already said that we can see that there are some several local maximums in this aggregated curve. There's the Dot Black one. So we have here that we can see here. There are several cadrs that are delivering four days, several then at six days then and eight, then at 11 and then at 14 days. So this could be useful if the team estimates, for example, maybe instead of estimating we could do some teacher sizing using this info saying OK, a small card or small T-shirt is four days.

Medium T-shirt is six days, a large is eight days, an extra-large is 11 days and an extra large is 14 days, for example. Ok. And remember, it's always working days, the fifth type of information is projection, and in this case, OK, I will show you again the Burn Up graph again, but now we have two new curves that are green and yellow. This one's green and yellow are curves are projections based on Monte Carlo statistical simulations, Based on the teams which are based on the team's delivery behaviour, the red curve and existing backlog, which is the blue curve. And so then the yellow and the green curves show when we can based on that information, when this backlog could be consumed. delivered. First of all, we have some problems here because first of all, the backlog will probably continue to grow. Ok. Second, the behaviour of the team may vary throughout a long time and there is little information out of the river yet. And that's why the yellow and green curves do not even manage to intercept the blue curve on this graph. So this graph shows the burn up of a team that has been working on a project or product for a short time. But we already know that this backlog grew a lot from the beginning.

This is the number of cards, almost 500 cards that sounds more like a traditional project, more than a more lean project, for example. And this one is the same graph, but shows there is enough of a team that is already ending the delivery of our product. The team was constantly along the time, reviewing, cleaning and prioritising the backlog. So when the backlog grew too much, we started to prioritise real expectations and the backlog were growing, but not in a super heavyweight. And so that made that the difference between the delivery, the red curve and the backlog remained more or less constant. And there comes a point where the backlog has stopped growing so that the end is near. We can see it. In addition to the delivery, information is already fed up for the forecasting algorithm. So the yellow and green curves are almost the same, and that means that the uncertainty of finishing the backlog is simply within a small range of possible dates. These two different curves are based on different percentages of confidence, so the green has a 50 percent of confidence while the yellow one has a 70 percent of confidence. That's why, based on this simulation, they are quite different. But so and regarding this, be careful we cannot make projections with 100 percent of confidence because otherwise it will be to have a crystal ball that predicts the future. And no reality does not work like that.

When all is only here we know that it doesn't work in that way. So then we have the final type of information that uses flow metric as a basis for execution of experiment experiments and why, What for? First of all, we can always run experiments in our teams to see how or what things fit better than other ones or how behaviours are improved, improving some things than other ones and yada. So the idea is to maybe evaluate the state before and after an experiment, and we can use it with data, and this is super objective data. And from that, I start to see new ways of doing things and opportunities for improvement inside the team. So here is a real example I didn't say at the beginning, but all the graphs shown in here are all real, where all graphs were from different kinds of teams and different kinds of products that we have been developing for years, OK? So everything is real, and this example is also real. There was a hypothesis once upon a time where someone said. Why have pairs of developers working on a single card if each of us or the developers took a card, we will go twice as fast. Ok, instead of doing pet programming, we start to do a lot of analysis. I will be very brief here.

I will show briefly here just a few little examples. But for example, this graph shows a person who in the first here in the first and in the third week was working alone. And while the second and fourth week work don't do programming with someone else, the average development time of the cards that the person working that week were in some cases shorter. And in other cases, shorter when working alone than doing programming. But in other words, they were longer. And the same happens when working in pairs OK, so it doesn't have any pattern to see here how I do or deliver features for a long time if I work in pairs or not. And here the second scenario is for a person different from the previous one who always worked doing programming, but the same thing happens again. There is no pattern. On the average time that a task takes, the tasks are different. Furthermore, there is no part that says if doing pairing takes you longer or not. So you cannot conclude that the initial hypothesis is true. Ok. And well, I have already told you this is just a limited real, but not complete example, because there were a lot of cases that we were analysing. And in fact, we repeated this same analysis at the beginning of the team and on the product development when they had already been working together for some months.

And then almost at the end of this project or team to deliver a product. And we never, ever find any pattern at all. Never. There is no pattern that you can say we would try to compare just a little task with another one's just bigger task with another ones, and we never found any pattern. So just at the end of the day, if you can make decisions based on fact rather than forecast, you get results that are more predictable. Lean development is the art and discipline of basing commitments on facts rather than forecast. And this is a quote of Mary Poppendieck on a paper that she wrote called Lean Development and the Predictability Paradox. So if you have the chance to take a look at it, I fully recommend this paper. And just as a final recap, some important points. First of all, Make decisions based on merit rather than forecast. Please, if you have facts, if you have data it is much better than saying like. Mm-hmm. I don't know. I think that this could work or I think that we are not doing things right. So please, if you have data, use it, it's a super powerful tool. Almost all issue trackers have an API, so you can integrate to the API and instruct the data to analyse it and see which behaviours or what behaviours are present and start doing that. The analysis itself. Ok.

Third, start with the simplest of metrics such as throughput and lead time. As I have already told you, continue then split it, splitting the simplest metrics for different scenarios. For example, I don't know, just taking a look to macro the micro throughput, but now I want to see four for each team or for each kind of cards or tasks I don't know and are super simple lead time and throughput. And you can have a lot of info and show your team the metrics you obtain and all the interpretations that you can together make decisions to make improvements. Don't, don't leave the session just on one person on the team leader, for example, try to involve the team and the team will gain more ownership also and finally train your stakeholders on the meaning of those metrics, behaviours and analyses to get line for them to understand what are you doing to understand information that you're going to show them, for example, related to projections and that will also come down them regarding to when are we going to end the project and that kind of things. So, again have in mind all the data, you have a lot of information in there. Thank you all. Before you go, please take one minute to tell me how it went. I will share the link to the feedback form. And here is also my LinkedIn handle if you want to talk even more in the next few days. Thank you.