How We Manage Releases at Minecraft

Knowledge / Inspiration

How We Manage Releases at Minecraft

Continuous Design
UXDX Europe 2020
Slides

During the past year we've explored a new approach to Minecraft release management, with concepts such as Minimum Lovable Product and four levels of Done.
In this talk I'll show how it works and what we've learned.

Hello, I’m Henrik. I work on the Minecraft gameplay design team and do a mix of gameplay design, feature development, and team coaching. I’m going to talk a bit today about how we manage releases and a little bit about how we do design.

So first, a bit of context, Minecraft is a game. It’s been around for 11 years or so. And it’s been growing pretty much continuously, ever since start. And now we have about more than 120 million active players, and they are pretty much in all ages. This game is actually two games, because technically, there’s two codebases behind it. There’s the Java Edition of the game, which runs on the Windows and the Mac and Linux, and it’s built in, well, Java. And then there is the Bedrock Edition of the game, which runs on Windows and the mobile, and then consoles. So, these two are trying to pretend to be one game. We want it to feel like it’s one single game, and we want as much feature parity as possible. But for historical reasons, it’s two codebases.

When we released the game, we actually tend to make larger releases about once or twice per year. And we used to give them a theme of some sort. So for example, Update Aquatic was all about improving the oceans and making them more fun and interesting. In 2019, we did the Village and Pillage update, which was all about making villages more fun. And then, the latest release, the Nether update was all about making the nether dimension more fun. So, we tend to have this kind of high level focus and improve kind of one aspect of the game for each release.

However, we don’t ship them as kind of big bang releases. Instead, we deliver small, small increments that we call snapshots or betas. So, we have three teams and we do sprints. So, we work in two week sprints, and then every week we ship a snapshot or a beta. And a snapshot is pretty much the latest greatest, it’s our—it’s whatever we’ve built up until now, we ship it, and players can opt into that. And what those players get is of course the latest, coolest stuff, but it might be a bit unstable and it’s definitely not, not complete. So, it’s a bit of a partnership there, because then these players who are playing on the snapshot—and that’s like quite a lot of players—they give us really useful feedback, and then we adapted that feedback and improve the product. And of course, by the time we get to the actual release, then we typically know that it’s going to be a success because we’ve already had so many chances to improve it. So, we aim for the clouds, we want to make something like, we want to make a big impact with the game, but we ship in small increments to kind of make sure we’re always learning.

So, just like any organization, we have a lot of challenges and here are just some of them. Scope management. We kind of have fixed-ish dates with our big releases, and we want to of course maximize the amount of fun that we can put inside that release, so how do we manage that in practice. Dependencies between features. Dependencies are sometimes, you know, pitched as something bad for development, you want to minimize dependencies. But, actually in our case, dependencies are good. We thrive on dependencies, because you can have three different features. Let’s say a new item and a new mob—a mob is just game speak for new creature—and a new, let’s say a structure, like a hidden template or something. If you have to go to the mob, trade with them to get the item, and that item unlocks the hidden temple. Now, you know, one plus one plus one equals ten, because these three features together like form a system, which make the game really fun. So, we like dependencies, but of course we have to manage that in some way. Java and Bedrock parody is a challenge, because we want to ship on the same date on both platforms, and they should contain the same features, and they should work in the same way. And that’s hard because some features are easier to build in one platform and hard to build on another platform, and how do we manage that in practice. Then there’s the thing about our players. Like, Minecraft is more than just a game. It’s almost like a game engine or a platform, because you have players building mods on top of it or building mini games. We have different place styles too. We have people who are adventuring in Minecraft, building fantastic architectural masterpieces, or just socializing, or perhaps building massive machines. So, people play in very different ways, and we want to cater to everybody and make sure we don’t leave someone out. So, these different players all kind of love Minecraft, but for different reasons. And of course, that leads to a lot of change. As we get feedback from these different types of players, we want to adapt to that feedback. And then of course, how do we handle that and how do we stay aligned in the face of all this change when we’ve got multiple teams. So, yeah, we have a lot of challenges and I could go on and list more, of course.

But I like to start talking about design, and kind of share some of my thinking around that. Ideas are never the bottleneck. We always have infinite number of ideas. And we have this release which is like a bucket, and we can only fit so much. So, the challenge is, out of all these billions of ideas, what are we going to put in the bucket, right? Because some ideas are better than others, some are more costly than others, and we have to figure this out. So, the challenge is really figuring out what not to build. I used to work at Lego, and they’re like—they refer to their design process as kind of like an idea killing process. Because again, ideas are free, they come from all over the place. The challenge is to figure out what not to build, or at least what not to build right now. So, yeah, choose wisely.

So, how do we do this? How do we find the awesome? We want to build features that are fun, but also technically feasible and ideally fit the theme, as well. So, in the perfect case, we find the diamonds that fit all three. But we’re actually okay with maybe gold-like features that are fun and technically feasible, maybe don’t fit the theme. But we at least try to find features that, for the most part, match the high level theme of that update.

As a designer, we need to wear two hats, both gameplay and tech. In some organizations, these are different roles, but in our organization as a designer, I’m actually both designing and coding. So, I’m building the feature that I designed. Of course, in collaboration with other people, but there’s no handoff. It’s not like now you take over and now you build this feature properly. I think that’s actually really useful because that means as a designer, I have to take both kind of aspects into account. Ideally, gameplay should not be limited by tech. We should adapt the technology to all the great ideas we have, right? Technology should serve our vision, but we also want to be kind of smart. So, when we are prototyping, we are also looking for like, “Oh, what’s hard to do. What’s easy to do?” And then, adapt the design to that, so we kind of go both ways. And that’s what lets us find this kind of great design, which is features that are really fun and aren’t like unnecessarily hard to build.

And if we take those two aspects and kind of put them on a graph, then we can put gameplay value vertically, and then technical cost and risk and uncertainty horizontally. And then, we have this line and of course we want to be above the line. We want to find the features that live up to the left. Features down here, down to the left, could be useful. This could be a low hanging fruit, some little thing that is easy to do, but that’s going to make at least some player segment really happy. It might be worth doing still. So, low gameplay value doesn’t mean like, it’s not fun. It could also mean that it’s really fun, but for like one small player segment. Similarly, sometimes it’s worth making a major technical investment if the gameplay is that awesome, right? So, that’s fine. But what we try to avoid is the kind of this land of meh down here, right? Meh feature like this one could be—maybe it’s great, maybe players like it, but it just wasn’t quite worth the cost, right? We could have spent that time building better features.

So, when we are brainstorming a release and the features that are going to be in it, we kind of have this very vague high level sense of where is this feature on this kind of spectrum, right? So, Feature A might be up here, Feature B down there, and then Feature C over there. And, you know, if we take that kind of bucket into account, then yeah, Feature A, we’re going to keep that, and Feature B of course not build it, and Feature C, well maybe, if it can fit in the bucket, right? We’ll see. So, this is like a tool for us to prioritize.

And in practice, if you look at that journey, a lot of it is driven by prototyping. Because prototypes are like, they’re really useful. Because when I have to make my feature actually run in the code itself—like, I’m not talking about paper prototypes, I’m talking about actual running code—then that forced me to think about both the gameplay and the tech. Because even though I’m taking all kinds of shortcuts and doing hack fixes because it’s just a prototype, even so, I am getting a sense of like, “How hard is this to do properly? What is it going to take?” So yeah, again, two hats, right? Gameplay and tech, and prototypes let us kind of we’re both.

So, let’s say I’m building a prototype for something. Let’s say I want to build like—I want to build a “Ride a Dragon”, let’s say, right? A new feature, you can ride dragons in Minecraft, right? Super hypothetical. So, I have an idea of how that might work, and I code a prototype for it. And here’s my hypothesis. And as part of that, maybe I’ll learn that, “Oh, this was actually really tricky, especially the part of a player controlling a flying entity. That’s really complicated.” So, then I can maybe make some compromise in a sense and say, “Well, what if the player can’t control the dragon? I can sit on it, but the dragon decides where to go. Maybe that’ll be kind of fun too, and a lot easier technically.” So, I make a new prototype, and that’s my hypothesis, right? That it’s going to be easier to do and still kind of fun. And maybe that hypothesis gets mostly validated. So, okay, this is promising. Now, I might feel that it’s time to put it in inside a snapshot. And that basically means giving it to the players. And when I give it to the players, I will almost inevitably be surprised. Because they try stuff and they use them in different ways than I could’ve imagined. Maybe they set up automated transport systems using dragons, and I had no idea that was even possible, right? So, put stuff in the hands of players, get surprised, learn from it, and almost inevitably get ideas for how to improve the feature. So, I have a hypothesis that if I make this small change, maybe I make it possible for dragons to carry items effectively. I can put up bags on them or something, I don’t know. A sled, like Santa Claus dragon, I don’t know. And that’s my hypothesis that this can be done without a lot more technology, so then I build another snapshot. Of course, now I’m not prototyping anymore, I’m writing production code. Because snapshots are production code, they’re just not polished and complete. And then, I released that, so et cetera. This is what I mean by exploring the design space as a feature. We’re kind of bouncing around here and trying to find the awesome, right?

And sometimes it leads another direction, right? Sometimes I’ll be hypothesizing and prototyping. And then, finally I’m like, “You know what? Yeah, it’s going to end up there.” It’s going to end up in the graveyard of darlings, right? Kill your darlings. And it’s not alone, there’s a bunch of darlings in that graveyard, features that once felt promising and—yeah, didn’t quite pan out. Maybe it is an okay feature, but it’s just that there are other features that deserve that spot inside the bucket better than this one. As opposed to reality, these darlings, they’re not permanently dead. They’re like zombies. They may come up for some future release like, “Henrik, you buried me last release and now I’m back again. Please put me in the next release.” And I might be like, “Hmm. You know, maybe—eh, maybe not.” Then I’ll kill it again and put it down on the ground, right? But they’re not permanently dead, ideas can come back again and that’s perfectly fine. So yeah, it can go both ways.

All right, so I’d like to share a little bit about the practicalities of how we manage this kind of way working as a team, or as three teams. The thing is, whenever you have multiple teams, it’s very easy to get misaligned. What I mean by that is like, here, this team is trying to solve the problem of “we need to cross the river”, right? So, they’re building a bridge because that’s their hypothesis for, “what’s a good solution for this?” But what they don’t know is that there’s another team like, on the floor below, and they’re also trying to solve the problem of “we need to cross the river”, but their solution is different, right? And then, maybe they don’t match each other. And now we have a problem, we have misalignment and frustration and waste.

So, this tends to happen quite a lot in—when you have many teams trying to work together. And there is somewhat of a universal remedy. I find it works most of the time, which is to kind of just make everything really visual. So, simple things like this, To Do, Doing, Done. Some would call that a portfolio board, right? Features, priorities, which team is working at which, et cetera. In most cases, this layout would actually be sufficient. However, this didn’t quite suit us because what does Done mean? Is it done when we’ve put it in a snapshot and started getting player feedback? Or is it only done when it’s releasable? And what about the two editions? What if Java is done, and Bedrock is not? Then, you know, it’s kind of hard to get an overview for us if we just kind of simplify it down to there’s just one done.

So, we’ve introduced a model, which we call the Vanilla Dashboard. Vanilla is just internal term we use to refer to the game itself, Minecraft. And it’s really the same high level concept as what I just showed, but more kind of adapted to our needs. This is what it looks like. There’s my team lead, by the way, she is awesome. And if you were to come to our office—before COVID shut it down—then you would see this massive thing on the wall. In fact, this is only half of it. The other half is behind there. Huge, massive visualization of what’s going on. And you’ll see people gathering in front of that board, and having animated discussions and making decisions, and things like that. And you’ll see a feature cards, and you’ll have like weird concepts, like a minimum lovable release, et cetera. So, I’m going to go through what this is and how it works and why we made the system.

The purpose of the vanilla dashboard—and it is both a tool and a process—the purpose of it is to help us stay aligned. And what it gives us is less stress, right? Because people get stressed when they don’t know what’s going on and that causes sub- optimization. It also gives us a better game because if we are aligned, we can move faster and make better decisions. And it does that by giving us realistic expectations. We can see what’s going on, right? How fast we’re moving? It’ll trigger discussions and it’ll reveal problems. Visualizations like this don’t solve problems. But by just visualizing them really clearly, then they help us detect problems early. And then, we can often fix them before the too late. And it makes change easier because if the plan and the status is super visible, it’s easier to walk up to it and move something and change it because we learn something new.

So, there it is in its full glory. And again, this is only half of it. And then, COVID came along and of course we all went home. And then, we became a distributed team and no access to the physical wall anymore, so we of course digitized the type. So, here’s the digital version of our board, it looks pretty much the same. We use a really great tool called Miro to make this board. All right. But, how does it work, right? What are the things mean? Let’s decode it. At the top, you see features. Each card is a feature, and they are prioritized from left to right. So, leftmost is the highest priority. On the card, we have the name of the feature, some concept art, and then the “Why”. Like, why are we building this feature? In this case, the respawn anchor we’re building, because we want people to be able to live in another dimension. It’s really important to keep that Why, because as soon as we start forgetting the Why of a feature, that’s when we start falling into kind of mediocre design. So, it’s really important to always remember the Why.

Yeah, so each column is a feature, and they go from left to right. And they are grouped into what we call the MLP, the Minimum Lovable Product. There’s like a red line hanging on the wall, and everything to the left of it is included in the MLP. And it means pretty much what it sounds like. It is basically the kind of minimum content of the bucket. Without these things, we don’t really have a release that we could stand for. So, we don’t call it minimum viable because we don’t want minimum viable, we want minimum lovable. So, something we can stand for, if we were to ship this and get these things in and nothing else, we would still be proud of the release.

However, that we are not aiming to release the MLP, we’re aiming to release more. So, this is really a minimum. Therefore, the MLP should be small enough that we feel in our kind of gut feel that this is ridiculously small. We can finish this in half the time. Obviously, we’re going to be wrong. Things always take longer than you think, right? But by really, really, really making it minimum, then we increase the likelihood of finishing the whole MLP plus more. And so far, we’ve managed to do that.

And the guiding question for deciding if a feature is part of MLP or not is would we delay the release for this feature? So, if this feature is not done by the release, would I rather move the release dates to get this feature in? And if the answer is, Yes, then it’s MLP. If No, then maybe it’s beyond MLP. So again, we aim to ship more than the MLP, but we don’t bother trying to predict how much more. Instead of we just say, “At least this, and then as much as possible of this.” And because we, humans are seemingly incapable of making estimates that are correct, then we don’t bother trying to estimate how much of that we’ll finish. Instead, we’ll do as many as we can, and we’ll see how many weight we end up with.

Furthermore, each feature itself has what we call MLF, Minimum Lovable Feature. It is like a subset of this feature that would make that feature lovable. So, it’s kind of like MLP, but like within one feature. Example, here’s the Piglin mob, right? The MLF was for us, you can barter with them, they can be distracted by gold, they hunt hoglins, and they attack players who aren’t wearing gold. So, for reasons, that’s what we considered to be the minimum for them. If we take any of this out, the piglins would kind of lose their core purpose in the game. And the guiding question when trying to decide if something is MLF or not is, “Would we risk the next feature for this sub-feature?” So, let’s say for example, a baby piglins riding baby hoglins, that’s a really nice thing. But if push comes to shove, I would rather we ship the next feature and not have baby piglins ride hoglins than risk delaying the next feature. So, with that in mind, we would put that kind of thing in a separate note called, let’s say, Piglin extras or whatever. And those can be put on the wall as well, but they go after the MLP, and then they get prioritized in comparison to other things that are also outside the MLP. So, this is how we kind of manage scope creep. Scope creep can be okay, but it should be deliberate. And it shouldn’t be—we shouldn’t be swelling the minimum lovable product side of it. We should kind of have that after.

All right, so back to the concept of Done. I mentioned that we didn’t want to have just one definition of Done. So, instead of we kind of split it into 4 levels of Doneness. The first level is Designed. And when a feature is designed, it means we kind of know what the MLF is. We’ve decided through prototyping, we’ve concluded that this is what makes the feature awesome. These things would be great, but they can be done later, right? We also have a sense of the technical constraints or needs for this feature. And we’ve synchronized across both platforms like, “Okay, is this realistic? Can we do this?” So, it’s not like a frozen design specification. It’s not like we’re ever completely done with design. But when we can say that design is done, it really just means done enough for now, like we can move on. And designed is shared across both platforms because we designed for both platforms. But the next three levels of Done are kind of tracked separately for each platform, because we’ve got to do them on both codebases, and they might be at different states.

So, Runnable means we have something running. Like, I can look at it, I can play this feature. It might not be stable yet, it might not be complete. But I can essentially play the feature. We actually renamed that to Playable recently. A Snaphotable means the feature is in a state where we could put this in a snapshot. When a feature is in that state, it essentially means it is in the hands of our players or will be within a week. Really important milestone, because that’s when we start getting real feedback, right? And Releasable, of course means—you know, if this feature—like, if the date, the release date is tomorrow, would I be okay with this feature being included in the actual final release? And if the answer is, Yes, then okay, it’s releasable. So, this is how kind of feature goes through these different stages, in a sense. And they’re not super distinct. Even during the Releasable stage, we might go back and make improvements. So, blue note is Work in Progress. So, in this case, we have something we’ve already snapshotted the Crimson Forest, but we are in progress trying to make that releasable. So, fixing the last critical bugs, tweaking things, right? Green note means Done, or done enough for now. And again, it doesn’t mean done forever. And then we can annotate sometimes to mark, like problems, I’m blocked here, I need something from somebody, et cetera. So, that’s one of the purposes of these visualizations, is to make problems visible, so we can kind of address them and do something about it. We do a little bit of estimation, we don’t spend a lot of time in it at all. But sometimes it’s useful to mark a feature as large or maybe small or medium, just to get a sense of, “Okay, this is going to take a lot more time than that one.” It helps for planning and prioritization, but our estimation is very, very lightweight.

And so, the nice thing about this is you can take a step back, look at the board, and just look at the green, right? As it fills up with green, that is how we get a sense of progress. And we can also talk about, “Okay, where do we want to go next? Where do you want to see the green flow?” So, how does the green flow, is it from bottom up or from left to right? Or what’s the sequence of things? And in some kind of like ideal lean-perfect theoretical world, maybe we would do Feature 1 from beginning to end, ship it, and then do Feature 2. So, one thing at a time, in the lean world, this is called one piece flow. And this could be useful if you’re doing, like, if you’re on an assembly line or something or maybe if you’re implementing support tickets and each feature is independent of the one before, then this might be great. But for our type of work, this would not work. You know, if we ship Feature A and get it all done and then design Feature B, then we can’t let the design of Feature B impact the design of Feature A, it’s too late. So, we want this kind of systemic thinking of how our features relate to each other. So this will not really work for us. Instead, our world is a little messier, kind of deliberately.

So, we typically design a number of features in parallel to kind of prototype them together and see how they fit together and how they support each other. And then, we move towards snapshotting maybe the first one, while we are making the other one’s playable. And then maybe we would learn something from that, then maybe we’ve changed the design of Feature C or D. And then, maybe when we’re snapshotting and Feature B, C and D, maybe we start designing Feature F and G, and then let those impact each other. Oh, and we came up with an idea of how to improve the design of E, right? So, it’s kind of we’re going a little bit back and forth, but in general, we are kind of trying to prioritize the highest—the first feature as much as possible, while still allowing us to do some level of parallel design so that we can have features build on each other.

A visualization like this tends to go stale quite quickly, unless you have rituals in front of the board where people look at the board and talk about it and update it. So for us, that’s called the Dashboard Review, and we do it every week. It takes about half an hour typically. And it’s an open format, anyone is allowed to join. We normally get a fairly big crowd, it varies, but it’s opt-in. Show up if you want to know what’s going on, or if you want to influence what’s going on. And most people do want to know what’s going on and they do want to influence what’s going on, so we get people showing up even though it’s an optional meeting. And that’s where we kind of go through what’s going, right? Is this view correct? Are there any problems people are having? Anybody waiting for someone else? Do we have any major decisions or pivots we need to make? So, it’s a really important meeting and it happens every week, so we get a fairly good pulse, so to speak.

And to the left, you see, that’s like the Feature Dashboard, that’s the board I showed you before with all the features, it focuses on the player facing thing. So, it’s like a user perspective. And the board in the back is the Tech Dashboard. It follows the same format, but the stuff up there are technical things and enabling technologies that we need in order to ship the gameplay features. So, both are equally important, so we put them both up on the wall near each other so that we can make sure they’re aligned. So, the MLP of technical infrastructure should align with the MLP of gameplay features, and sometimes we need to make trade-offs. Another purpose of the meeting is to get a sense of progress. Like, how are we doing? Are we on track? Are we going to get the minimum lovable product done in time for the release? If not, what are we going to do about it?

So, in order to get a realistic picture of how we’re doing, we can do some simple math, but basically just count the green stickies every week, right? Put it on a graph, do the math. And then, we can see because feature is essentially seven green sticky notes that need to get in there, right? That’s how many sticky notes fit in one column. And then, we call those Feature points, and we can just count how many do we get done per week. Sometimes it’s more, sometimes less. But on average, that line is fairly smooth. And we can get a—you know, do some basic forecasting, like, “Okay, there’s the date, and here’s how fast we’re getting stuff done and how are we doing.”

So, a bit of data, but subjective data is also really important. So sometimes, maybe once per month or so, we basically do a quick survey and ask people, “How does it feel? What is your gut feeling looking at our current set of committed features?” And committed feature by the way means—like the minimum level of product is by definition already committed. But as we get those done, we will kind of commit to more and more features. And committing basically means saying that, “You know what? I think this is going to get in, let’s assume this is going to get in.” And the question then is, looking at the features that are not committed, do we buy this? Does this feel realistic? Quick survey, scale of one to five, put it on a chart that gives us a signal. And the goal is that it should look something like this. It should be fours and fives, because otherwise the plan is probably unrealistic and we should make adaptations. So, this is like a warning system. It tells us if we’re kind of—if we’re in trouble. And again, if we noticed problems early, we can almost always adapt and make the most of it. Nowadays, we tend to do this digitally, using a tool called Mentimeter. Just pull up your phone and poking the number from one to five, and boom, we have this graph.

All right, let’s wrap this up. What are the key points that I’m trying to get across here? Product development is, to me, all about experimenting and learning. We are in the realm of complexity, we don’t know exactly what our customers need or how hard things are to build. So, we need to see it as a journey of kind of exploration. And transparency, I find is fantastic. It makes everybody smarter somehow, and happier. Because if people know what’s going on, they relaxed a little bit and they get happier, they get more focused, there’s less frustration, less gossip, less confusion. So, transparency is super useful as long as the stuff you visualize is the stuff that matters. And that can take a bit of experimentation to find, right? So, visualize the stuff that matters and then people will get engaged. It can take a while to find exactly what is the right stuff to visualize and the right way to visualize it. So, what I showed you was the result of like months of experimentation of finding our way of visualizing what we’re doing. And we still change it, fairly regularly. I would say maybe every month or two, we make some tweak to how we visualize stuff. It’s domain specific, so you probably won’t want to use our exact system, but some parts of it might be useful to you. Maybe, especially, I would say the MLP and MLF concepts. I think those concepts are not in any way specific to our domain, they’re just generally useful. So, I can recommend maybe trying that. So, minimum lovable product, minimum lovable feature. And finally, I want to emphasize the importance of shipping early and often. In our case, that means every week shipping stuff, getting real feedback from real players, all the way to real developers with no intermediaries. That fast feedback I think is crucial to the success of this game. And without that real feedback, all these sticky notes on the walls, that would just be one big illusion of control. And we would actually be flying blind. So regardless of what you do to visualize what’s going on, make sure there’s some way of actually shipping stuff and getting real feedback. All right, that was it. Thanks for listening.