Why So Many ML Models Don't Make It To Production?

Knowledge / Inspiration

Why So Many ML Models Don't Make It To Production?

Continuous Delivery

Even though the future is about data, machine learning and artificial intelligence, between 80-90% of the ML models are not deployed (based on different researchers).
In this talk, I'll share the different challenges that we experienced within this space, what solutions we have implemented and any future improvements we have planned in this space.

Hi, everyone. I hope you are enjoying your time with UXDX event. I am Dana, Director of Engineering the Trustpilot. I'm based in Copenhagen. And I work in the content moderation, fraud and online abuse detection space. I am blessed and grateful to be working with talented people on solving business problems with Machine Learning. I'm here today to share a few of our learnings in that space.

Trustpilot is a reviews platform with 800 employees based on eight locations. There are more than 100 million reviews on the platform and a bit more than 500 websites with reviews on Trustpilot. Now, back to our topic. Why so many Machine Learning models don't make it to production. Some say that even 87% of the data science projects don't make it to production. I dig deeper into this topic and in a Gardener report, I found that 37% from 3000 surveyed leaders, they had deployed AI or they would do that shortly. I know that this number could look small. But this is an increase of 270% in four years, which shows the focus in the industry on Machine Learning and AI. At the same time in the Mckinsey report, I found that from 160 reviewed AI use cases, 88% of them did not progress beyond the experimental stage.

So, what is the root cause? Well, digging deeper into this topic I found out that sometimes they would cause could be performance or quality concerns around the model. Sometimes companies find a more simple solution or a more cost-effective solution to their business problem. So, they give up on Machine Learning and sometimes lack of management support could also be the root cause. Now let's imagine that we do have management support, we do have strong business case for applying Machine Learning; let's see what other impediments or blockers we could experience in shipping Machine Learning to the production. When it comes to the data, my main learning is that high quality, labeled and available data keeps being an expensive assets. So, embrace ourselves for example, if you need label data for your Machine Learning by label data means data that has, let's say tags referring to a classification or categorisation. So, if you need label data, companies usually have a few venues to choose either to buy either to label in house or to label externally in case they don't have the capacity to do it in house. Now each of this option would require coordination, alignment, research, quality checks. So, in reality, this effort could take months and of course, months of money invested. When it comes to data availability usually there are two scenarios, either the data is available in your company, but it's not easily accessible because of a siloed data approach or the data is available but in order to be used for Machine Learning it requires cleaning, processing extraction and similar stuff. Again, either way, the effort could take months.

So, my recommendation in this space is if you really have Machine Learning on the company agenda. The first recommendation is to invest in data as a service and this is a recommendation that you'll find all over the internet from different other leaders. So, it's about democratising your data, making it available to your data, scientists and engineers. And I found this image from PwC seem very interesting where they present how democratising data in AI could look like for a company. So, the secret here is that once you start to democratise your data feature, I'd say opportunities will come probably consider in the future to democratise model training, deployment process and all the other components. Now another recommendation in the data space is to, if you continuously need labeled data for your Machine Learning then I would recommend to consider embedding the labeling system in the existing products. So, the users label data when they're using the product and the most known example that you probably know is Google photos asking us to confirm the faces in different pictures.

Now, when it comes to the tools my main learning in this space is that, well, when I was thinking to Machine Learning I was mainly thinking about data science and data scientists, and I believe this is something that didn't happen only to me. In reality, in order to ship Machine Learning to production, there is more needed, more principles, engineering skills tools and other components. So, I found this view very interesting. And finally, at the same time, how Machine Learning with those components and tools could look like or without those components so yeah, we do have to think about retraining of the model versioning of the model monitoring the later pipeline and other components. Google has a similar view explaining the components from a Machine Learning system. So, they they're mentioning configuration management, feature engineering data verification monitoring. So, Machine Learning is not only about developing the ML codes but also considering all the tools needed to ship it to production. So, the same way we saw the rise of DevOps a few years ago, we are now seeing the rise of MLOps. So, ML Ops refers to techniques for continuous integration, continuous delivery, configuration management, and other automation in Machine Learning. Now my main recommendation in this space is to define the ML components needed or desired and invest in developing those components. Think about what's the foundation that you need and invest in those components. Now there are many learnings from DevOps philosophy. So, I do hope that we can apply some of those learnings in MLOps.

My other recommendation, when it comes to tooling is to define what ML tech debt means so sooner or later this will come to your backlog and to your team. So, my recommendation is to gather your team and discuss responsibilities what can data scientists do around tech debts, what data engineering can do, what are the tools or the practices to consider in this space? Sooner or later ML tech debt will come to your backlog. Now when it comes to processes, the main learning in this space was that, well, again, as I shared earlier, in order to ship Machine Learning to production, besides the Machine Learning knowledge and skill is there are more roles and skills involved. You need data engineers the configuration management and operations. What we discovered in this area was that, well, I'm proud to say that Trustpilot is a fast-paced environment with frequent releases and our assumption was that Machine Learning development will just follow that process because that was so embedded and so strong. Now, we were wrong and we discovered that well in parallel with our software development process or on top of it, we also have to accommodate Machine Learning development, which is different. It's different in the sense that in software development we have daily or weekly releases.

In Machine Learning developments, well maybe you have a one-time release for your Machine Learning. And then a lot of time is actually invested in experimenting or data discovery, which could take weeks, so no daily or weekly release is there. When it comes to the profiles involved with Engineers, they are builders, they look at the problem and ask themselves 'How do I build this?' Data scientists on the other hand, they start the work with the question or with an assumption and they work towards confirming that assumption or answering that question. Also something to consider in ways of work. So, the question here is how to bring engineers and data scientists together, in a way this is not a new problem. We have this conversation with how to bring testers closer to engineers or how to bring site reliability closer to engineers.

So, my recommendation in this area is to consider multidisciplinary teams. So, have your data, scientists, engineers, designers, product manager, working together in a team. Now we all know that this is not enough for collaboration. What we considered in, in our teams was defining the touch points needed between data, scientists and engineers so we avoid the waterfall approach in shipping and developing Machine Learning. By touch points, I mean documentation as a request for comments the demos or reviews, this kind of activities that would bring data, scientists and engineers together. Now, when it comes to the people a few learnings in this area. First well initially I wanted to name this chapter; Existing Systems, the logic behind it, being that we notice that our products, our systems as they are today, they are not always ready to absorb Machine Learning models. Now, I can't put the blame on the systems, but I can put my hope in the people that are building and maintaining those systems. And when I'm referring to people, I'm not referring only to data engineers and data scientists, but I'm also referring to product managers, UX designers, or UX researchers. We all know that we will have our own assumptions around user experience, the results of the model or the development of the model. So, it's crucial to engage the team in testing their assumptions and asking questions around those assumptions.

So, my recommendation is as always focused on user experience exploring and understanding how the users will interact with the predictions or the outcome on the model. How will the user behave if they would know there is AI behind the product? There is the element of trust also be considered there. And when it comes to developing and shipping Machine Learning, find a way to spark conversation between team members what we're considering here is to have a checklist of questions that the team can consider when grooming Machine Learning development. Things to be considered on the checklist are, for example, what is the complexity of the models? If you compare this model with the previous model, what's the complexity? Our mistake in the past was maybe look at the models as being the same, which was not the case are their questions to consider for the checklist are: How can we approach Machine Learning in a more iterative way; Do we need all this feature of the model in this iteration or; Define what are the data quality checks to consider or discussing with your team; How to test or; How to perform a dry run before you launch the model? We're working on expanding the list and I'm curious to see similar lists coming out from the industry as well. And my last recommendation when it comes to the people area. So, in one of the previous reports from Gardner they share that the leaders who have Machine Learning or AI on their agenda, the main challenge that they shared is shortage of talent related to AI or Machine Learning. So, that's why my recommendation is to when developing Machine Learning, to also facilitate learning among team members and to think about upskilling where this is the case.

Now a short summary of what we discussed. So, again, if Machine Learning learning is on your agenda, invest in data as a service and enable or unlocking the potential of your teams, define the Machine Learning ops in your organisation. What is the foundation that you need in that space? Find the harmony between Machine Learning development and software development and facilitate learning and growing skills within your company. I would love to continue these conversations. So, please reach out to me on LinkedIn or Twitter. I'm happy to share more learnings and thank you so much.