Should you start with Microservices?

Knowledge / Inspiration

Should you start with Microservices?

Continuous Delivery
UXDX Europe 2019

In this debate our candidates will discuss:
Enforcing Good Design
Services Challenges
Total Cost of Ownership
Technical Debt

Nix Crabtree

Nix Crabtree, Lead Principal Software Engineer,ASOS

Architecting for Change Debate – ASOS vs Voxgig- UXDX Europe 2019

Speaker 1: Do you jump all in with micro services or not?
Speaker 2: Okay. A micro services architecture requires significant architectural and engineering effort up front. Let's say you have 10 endpoints implemented as a micro services architecture from the get go. That's 10 code bases, 10 data stores, 10 pipelines to build and deploy, 10 automated provisioning pipelines. And that's just to develop them. The architectural effort required is huge. You have to design the segregation of responsibilities, the boundaries, the orchestrations, and the compositions up front, before you can even start writing any code. You effectively put your developers on hold, while the architects endlessly ponder over how best to align all of the arrows in Visio, instead of continuously deploying increments of software of business value and the feedback cycle that goes along with it. That is a lot of wasted time with nothing to show for it. But what is the alternative Nix, I hear you cry. Bless us with your enlightenment.
I am very glad you asked. These 10 endpoints can begin as a single simple code base. You create a repo, you add a skeleton code base with a ping endpoint, you manually provision a vanilla compute environment, which can be anything anywhere where probably at this point, 4 hours, if nobody cares. You manually deploy your skeleton code; Iteration One is done. And all you really need to know to be able to do that is the 10 endpoints to present. The consuming application doesn't care about the implementation, but I'll bet you cash money on the fact that the developers writing that consuming application, will love getting their hands on Iteration One pretty much when they get back from lunch. That single code base can then grow iteratively and continuously. The architecture is emergent and with good engineering practices, segregating out those endpoints is simple and straightforward, and the consuming application is blissfully unaware.
Speaker 1: Over to you Richard.
Speaker 3: So many thanks to my learned colleague, but I feel he needs to embrace his inner chaos demon. Yes, you will get something up and running almost immediately, but despite the best of intentions that is messy, if you could call it architecture that you started with on day one and week one, it never goes away, because it just gets deeply embedded. And someday, maybe two years later, you decide to have the big rewrite, and that's a whole world of pain. Better to start with a segregation of the system into lots of little pieces. And I dispute the difficulty of getting up and running with a micro service architecture in late 2019. These days, you don't need to put that much effort into it. You have things like kubernetes, you have all sorts of wonderful tools from people like Hashi Corp. The only piece that you really have to decide is how you are going to build, the only little piece of architecture you have to think about at the start is an abstraction layer for the messages.
How do the messages get from one micro service to another? And you can use a message bus or you could use just a simple little library that hides the fact that you are doing HtDP. So long as you have that little core piece of architecture, which is literally one day of code, you can get up and running and allow different members of the team to work on different things. And typically in a larger enterprise context, you don't start with one developer. You are given a team, a team with very heterogeneous skills and you have to get them all up and be productive from day one. So by using micro services, they don't step on each other's toes and you keep the production quality safe, and you don't end up stuck, painted into a corner with the wonderful architecture that you invented on Monday morning. I rest of my case.
Speaker 1: Excellent. We are going to give a quick rebuttal.
Speaker 2: I think there is a problem with the voting system.
Speaker 1: Any rebuttal?
Speaker 2: No, just doing it for simple iterations is definitely the way to go.
Speaker 1: Okay. So now it's all over to you guys. I know you have been frantically voting throughout, but now is the official time.
Speaker 2: You haven't even finished talking. Come on.
Speaker 1: So has everybody entered their votes?
Speaker 2: No, clearly not. No, not yet.
Speaker 1: I have a feeling Nix, you have lost this round. We will wait another few seconds.
Speaker 2: Come on.
Speaker 1: I am going to change the question now. So Round 1 goes to Richard. Round of applause for Richard. So let's start Round 2.
So Richard, ensuring good design. So if you are doing micro services, it could be, how do you keep everything coordinated when there are lots of little pieces moving?
Speaker 3: So, the way that you ensure good design is you let it emerge. It's really, really hard on day one to design a large complex software system. I've tried and, and failed many times. You end up back in that rewrite situation. Chris Lowes talked this morning, and he spoke about shimming, the interface between the front end and the back end. And that's because the design, the interface that the backend was offering to the front end wasn't sufficient. So they had to allow the front end engineers to develop their own interface specification, or way of talking to the backend that it was more appropriate for what they needed, responding to the emergent needs of the application. If you accept the fact that micro services have to talk to each other with messages, and you make messages the primary unit of thinking, the primary unit of design, it frees you up to start with a relatively arbitrary set of messages.
And then as the system evolves, change those messages into something that makes more sense, allow the domains to emerge. Now you are stuck with legacy services and you are stuck with legacy messages, but guess what? Shimming is easy. You don't have to write custom code to shim. All you do is translate between one type of message, the old types and the new ones and vice versa. And if you end up in a situation as Chris did, where you have a 10X explosion of messages to the back end, well if you are running a large system, you are going to be doing Canary deployments. So you are only putting 5% of your traffic through this new messaging system. And you can tell if that's going to kill the system or not. And because you are using this message oriented thinking, this message oriented design, it's easy to tell which messages are at fault. And in that way you remove the burden of being a genius effectively, and you can just, as an ordinary developer, get work done.
**Speaker 1: **Great. Thank you. And over to Nix.
Speaker 2: Okay. So that all sounds very boring. Good software design is all about education. It's about bringing people with you on the journey. It's about creating an environment where learning and putting that into practice immediately is a natural part of every line of code that you write. I am not talking about putting people in a classroom and just opening the PowerPoint fire hose. I am talking about a high trust team, which is as accepting of failure as it is of success. And that is a rich and fertile soil, which nurtures people along with their skills and their knowledge. Good software is built on some universal principles… these principles become second nature very quickly, when they are continuously practiced. That mindset doesn't develop just on its own. It develops by learning through feedback from others, from your peers. A “test first” pairing approach is a really excellent foundation for this.
It requires you to state what you are going to do before you do it. It allows your tooling to create skeleton code for you, reducing basic, human errors. And then there is a byproduct. You have a short feedback cycle, test suite, which tells you if you are going astray. Pop comparison is a great partner to this. Apart from the static code analysis tools in your IDE, it is the shortest possible feedback loop, but it has the benefit of being backed up by experience, by domain knowledge, and the team's own working practices. And that provides a rapid learning cycle like no other. You learn something as you are doing it. And you put that into practice immediately and you see the reward straight away. And that also builds trust. And in a high trust team, they are much more willing to learn from each other and help everyone else become better as well.
There is a big difference between knowing you shouldn't and being told that you can't. Trying to stop people doing things wrong through the process of gatekeeping is like plugging holes in a sinking ship. They need to learn good design. It's like explaining to a child why they shouldn't touch their freshly iced birthday cake and telling them you are putting it on a high shelf so they can't get to it. One of those scenarios is going to show you just how quickly and quietly they can engineer a precariously balanced furniture tower and put finger marks in the thing you made for them.
Speaker 1: Thank you. Just on time. Okay. So On ensuring good design. Is it around the micro services, the quick easy replacement, or is it about the teaching and the learning and the coaching?
Speaker 2: I think we already have the answer.
Speaker 1: I think this is going one way, unfortunately.
Speaker 1: Actually, is there any quick rebuttal that you want to give?
Speaker 3: Yes, a very short one. In theory, theory and practice are the same, but in practice, they are not.
Speaker 1: Okay. So we will just give another few seconds. Anybody want to give the last vote? Okay. So we are one all excellent. Keeping things interesting.
So the next section that we will be talking about is managing micro services. So it's inevitable that micro services are going to end up in your architecture, but then as they grow and grow and grow, it can become very difficult to keep track and to manage them. So on this one, we will start with Nix.
Speaker 2: Thank you. Okay, so managing micro services is hard, and here is why. If you go back to my earlier example, let's say we have 10 micro services. Micro services that don't work as part of a micro services architecture, are just services, right? So typically an interaction with micro services architecture, involves orchestrated calls to one or more of the services. Let's take one micro service in our example and say that actually it calls for others to compose a response. If you are running at scale, it becomes way more complex. So let's say conservatively, you have two running instances of each micro service, and those two share a data store, and we have that set up in two regions. In our example, that initial request can come into one of four instances for the initial request and one of two data stores. And then the orchestrated calls could hit one of 16 instances backed by eight data stores. And that as you can, well, imagine is a lot of logging and telemetry data. And it is well beyond the realms at this point of sifting through the log when something goes wrong.
So, you have to build in logging and telemetry from the start and it can't just be writing information out to log files. It can't just be one micro service. A micro services architecture is a single organism and the telemetry and logging has to have that built in from the start; it has to be a single view. So how do we find the right piece of hay in a haystack? Correlation, dependency tracking, and a really good log analytics system. Correlation means that your micro services should accept and pass on some kind of token, which identifies that as part of a larger journey. Dependency tracking is all about capturing telemetry data for the dependencies your micro service calls, and then tying those two things together into a single coherent story, is the job of a good log analytics system. Which means you can quickly and easily pinpoint a particular journey and see all of the data about it, in isolation.
Dependency management also becomes challenging. One of the tenets of a micro services architecture is the ability to make and deploy change independently from other services, but to achieve this in its pure form is a mind boggling set of processes and practices. You need to think about API version management, as well as binary version management. Business requirements are likely to impact multiple teams. So you need to orchestrate those changes and how they are built and deployed. Breaking changes can therefore result in regression to monolithic deployment.
Speaker 1: Excellent. Thank you. Finished on time, and over to Richard.
Speaker 3: I think my wonderful colleague is suffering from an approach that a lot of developers have, when they have started off in traditional environments, which is to put services first and to think in terms of the concrete diagram. Throw all that away. Think in terms of messages first, understand your system in terms of the message flows, a particular type of message generates another type of message or two or two different types of messages. if you think about the system in that way, you can think about analysing the whole health of the system in terms of those message flows, in the same way that a doctor measures your heartbeat, not the flows of a sodium ions through your cells; you understand the system at a higher level of abstraction. In order to deal with the issues around management and deployment and complexity, you always start off with a very, very small system with a very small number of services. And the way that your system changes is not by large top down design, it's by incremental change of the system itself. Any new service, any new message that is implemented by some services always enters the system by a single deployment of a single service instance. And that never happens without, having some covering fire, having some older service that also services that message in some way, perhaps providing less functionality. So you always have this model where you have a version A; and then you have version A and version B at the same time. And then you remove version A and now version B, which is the new one that keeps running. And in that way, you always know if you break something because you know that the very last incremental change to the system was to deploy a specific instance of version B and rolling that back, gets you back to a good state.
And in that way, you don't need to worry about all this complex log analysis of figuring out what's gone wrong, because each individual change is something that you have fully under control. The other aspect, which trips people up is the belief that you have to have extremely, well defined schemas, that you have to have very strict typing essentially of your messages. I reject that completely and say Postel’s Law is the way to go."be conservative in what you send, be liberal in what you accept". It's the only way to manage running multiple different versions of multiple services, and multiple different types of implementation in the system itself. And that means that, inside a service, you are free to determine whether you are going to be strict about what you do, whether reject in a given message or whether you make a best effort attempt to keep the system up and running. You don't have to worry about fragility introduced by having extremely strict schema. And if you take those strategies altogether, it allows you to keep that complexity under control and allows you to leverage things like kubernetes, which look after the day to day management of the system.
Speaker 1: Great. Any rebuttal?
Speaker 2: It sounds like you are just creating small increments of legacy services all the time, and then using a strangler pattern to get rid of them, instead of actually doing it collaboratively and making sure that everyone is on the same page.
Speaker 3: Don't introduce people into the system.
Speaker 1: Okay, great. I see people are already starting to vote. So we will just give another few seconds, if you want to complete your voting. Somebody once commented that feedback can sometimes be a little bit competitive among speakers. I think this is a competitive event, another 10 seconds. We will see whether anything's going to change.
Speaker 2: Come on, people.
Speaker 3: Have to go down to goal difference.
Speaker 1: And we are on to the next question. Technical debt. So tech debt exists in any system, regardless of whether you are doing micro services. Whatever assumptions you make now they will inevitably prove to be false. So managing tech debt is critical. Richard, would you like to start?
Speaker 3: So this is the whole point of using micro services. This is the whole reason for upending your way of thinking about things and putting up with some of the minor technical issues that Nix has alluded to. In our system, we still have the very first micro service that we built that runs a particular piece of functionality running. It's been running for two years, and it's completely incompatible with the rest of the system. We have come up with much better designs that have emerged from things that we built later, but it's still running. It still keeps our clients happy. And, when we bring in junior engineers, they can do a little bit of iterative change on it to fix it up. What is actually going to happen, a little bit down the road is that we are going to completely recut that service, build an entirely new one and then deploy both of them at the same time and put a few new customers through the new service.
And once we are happy that everything is working, we are going to take the old code, which is full of technical debt, badly written, not efficient at all, and we are just going to throw it away... We are just going to delete that entire code base. What has happened there, is that no developer has got an emotional connection to that code. Nobody has decided that it's theirs and they are going to defend it against newcomers. We don't have to unravel the bad design that we have put in place there. Yes, there is a little bit of the database schema that needs modification, but that's actually easy because one of the rules around micro services that I like to follow, is that you don't have to have separate data stores for each micro service, but even if you are using one database, stick to one table per micro service, and also, shield that database by having a messaging layer that allows you to talk to the database.
So again, you can go back to that message translation tactic, which I spoke about earlier. But this also lets us build new services rapidly, allow junior engineers to build the services because even if they make a mess, even it's not efficient, even if they haven't done things properly, even if they have created a message API, which is not quite right, which is inefficient, we know down the road that we can throw it away. And when somebody starts working for me, I always say, don't get attached to your code because I will personally be deleting your code in three months’ time.
Speaker 2: So, my learned colleague seems to be drawing a parallel with the use of the plastic water bottle. It's disposable, we don't care how it's made. We just throw it away and get a new one, and we all know where that's got us. So let's talk about tech debt in terms of two things. Let's talk about how it impacts your customer; bugs or emergent features. I have said it before; I will say it again. Good software engineering principles are fundamental to preventing tech debt in your code base. If you don't need it, don't put it in. If you are building for what you think you might need, you have code that is already doing nothing but complicating things or even worse. It's doing the wrong thing and now you have to maintain it, that's tech debt. When requirements change, as we expect them to do in an Agile mindset, you have code that's wrong that nobody asked for, that you have to rip out or change to meet the requirements and that's tech debt, caused by your tech debt.
There is another aspect of this called contagion. Contagion is a term I first saw used by Bill Clark of Riot Games, League of Legends fame. Contagion is viral tech debt. If someone writes something badly, even if you are going to throw it away, especially if it's someone senior, other developers will follow that implementation. Maybe they straight up copy and paste it, but maybe it actually even contaminates the applications that interact with your application. And suddenly one piece of bad code starts springing up all over the place. And it's really hard to track it down and get rid of it. There is also tech debt that slows down your developers; obvious things, lack of automation, clunky pipelines, inefficient release processes, so on and so forth. But in the short time I have left, I want to talk about one thing: do not touch the shiny things. Do not even look at the shiny things, do not listen to the sweet siren song of the shiny things. Start with the vanilla tech. Tech you already know like the back of your hand, if possible, or at least which has been used successfully in your organisation. Start with the smallest and most basic version. Change only the configuration you need to and have to, and then leave it alone. You are not automating those little tweaks. They will come back as tech debt.
Speaker 1: Thank you. Okay. Do we have a rebuttal?
Speaker 3: I actually think that the viral tech debt piece, it's not a rebuttal at all. I actually want to acknowledge viral tech debt is hugely, hugely significant.
Speaker 2: No, that wasn't mine. That was Bill Clark.
Speaker 3: You, you introduced it to me. Forget about everything I said. This one is super important. If you are a senior developer and you slack off and you don't follow your own processes and the quality standards that you have set for yourself or expect from your team that will infect everybody else in your team. And I think if there is one takeaway from this particular debate, it's that idea which I have found really, really useful and I feel really guilty about stuff typed under my own code base now, as a result. So, sorry, that's a super important point, which I have to acknowledge.
Speaker 2: Thank you very much.
Speaker 1: Great. Okay. So let's do the final round of voting.
Speaker 2: We have closing arguments.
Speaker 1: Oh, no, no, I am just finishing off this.
Speaker 2: Okay. This last one, good.
Speaker 1: So if everybody can finish their voting, I've noticed a pattern, whoever speaks second wins.
Speaker 2: This is poor. It's whoever is speaking about architecture, and whoever is speaking about engineering stuff.
Speaker 1: Okay. So we are actually 50-50. So in each question we are two and two. So what we are going to do is a quick closing argument. Unfortunately, I am going to ask you to start.
Speaker 2: Okay. I think it is actually fair to say that Richard and I agree on many of the things we debated and it was really interesting to play devil's advocate to each other. I think we both have a passion for doing things well, continuously improving how we do them and bringing people with us on the journey, which is great.
However, in the context of this debate, Richard is my sworn enemy seeking to defile the goods, to befoul the proper and undo all of the great and hard won things that we, you and I have strived to build over the years. It is therefore with a heavy heart and a strong will that you must in good conscience, reject his machinations and rally to the golden flag of proper development. Mount your unicorns, valiant people, and ride with me across the utopian rainbow fields of the greater good for all noble developers everywhere. Thank you.
Speaker 1: Try to follow that, Richard.
Speaker 3: Despite my learned colleague’s eloquent admonitions, they are a thin veneer over an ideology, which will lead you astray; very, very far into the desert where you will die of starvation and thirst. The only way to build sustainable things, the only way to build things that are robust and actually work is to put aside our, the conceit that our human intelligence is good enough to do the job, and embrace the way that biology and nature has done it. Evolutionary generation of design is a much stronger way to end up with a system that does what you want. You may not entirely understand it; you may have a deep level of discomfort with the fact that nobody in your organization actually understands the hundreds of micro services and how they interact. But yet, those types of systems are the only types of systems which can ultimately be robust and which can ultimately let you move really, really quickly. Because again, we go back to this idea of just making small, safe changes all the time.
Speaker 1: Thank you. So we will do the last, the last voting Nix is currently in the lead. So has everybody, everybody put in their votes? I think I am going to have to call this one Richard, congratulations Nix.
Speaker 2: Unicorns for the win. I even have rainbow shoes.
Speaker 1: But we do have time for a few questions.