Navigating Challenges and Technical Debt in AI Engineering for Finance

Talk

Navigating Challenges and Technical Debt in AI Engineering for Finance

Continuous Development
UXDX EMEA 2024
Slides

Join Ahmed Menshawy as he shares the intricate balance between maintaining model complexity and meeting the operational demands of real-time processing in the financial sector. Dive into the practical challenges and solutions in deploying advanced machine learning technologies in a large enterprise.

Ahmed Menshawy

Ahmed Menshawy, Vice President - AI Engineering,Mastercard

Today, I’ll be discussing AI engineering efforts at MasterCard. My team is mainly responsible for building the end-to-end machine learning pipeline around some of the AI systems that really touch on your daily lives. We add intelligence to every single transaction processed by MasterCard, and I’m excited to talk to you today about GenAI and how we use it at MasterCard.
It’s an exciting time because we are on the edge of a new technological era, where a powerful relationship between humans and technology is unfolding right before us. I'll also touch on how AI is expanding from excellence in structured data to unstructured data, augmenting human intelligence without necessarily replacing jobs. Finally, I'll discuss the challenges and technical debt associated with deploying these models in production.
Over the last two decades, AI has excelled in labeling things—fraud versus not fraud, for instance. You give it an image, and it can detect objects within that image. However, this isn’t the reality for most organizations. In most cases, data is highly unstructured, with more than 80% of organizational data estimated to be unstructured. Furthermore, over 71% of organizations struggle with managing and securing such data. Now, with GenAI technology, you can easily integrate domain-specific data into your generative AI applications, allowing you to formulate answers based on this specialized data while also discovering relationships and patterns that weren’t previously accessible with traditional AI or neural-based techniques.
The relationship between humans and technology isn’t a new one. Over the past 200 years, mathematicians and visionaries have dedicated their lives to creating technologies that reduce human labor while automating complex analytical and computational tasks. If you’re worried about GenAI potentially taking over jobs, I recommend an article from Nature that addresses this, titled Stop Talking About AI Doomsday When AI Poses Risks Today. It points out that current risks, such as bias and fairness, need attention far more urgently than doomsday scenarios, as we’re still far from developing the algorithmic foundations for AGI—true artificial general intelligence that can originate ideas on its own.
One visionary from the past 200 years, Ada Lovelace—also known as the world’s first computer programmer—had early insights into AI, or what she called the "analytical engine." She speculated that such technology would eventually understand musical notations and even be used to create music, much like GenAI does today. Yet, she was clear in her belief that AI cannot originate anything on its own; it can only perform what we instruct it to do. This idea still holds true over 180 years later, as we don’t yet have the algorithmic foundation for AI to truly originate anything independently.
Some people have the misconception that language models were developed by OpenAI alone. However, the concept of language modeling—predicting the next word based on context—is actually an old idea. What OpenAI did with ChatGPT was to incorporate an instruction dataset, allowing users to interact with the model as if conversing with another human. They achieved this by collecting data and using contractors to create an instruction dataset with specific prompts and ranked responses. This, coupled with reinforcement learning techniques, allowed them to fine-tune the language model further. With the scale of internet data, it became large enough for users to interact with as they would with a colleague, asking it to perform specific tasks.
These advancements have allowed GenAI, and specifically ChatGPT-like technology, to be used for many applications, while helping organizations manage the tremendous amount of unstructured data they have. At MasterCard, we have recently augmented our fraud detection capabilities with GenAI, achieving up to a 300% increase in accuracy for certain corner cases that previously challenged our AI solutions.
Now, let’s discuss the essentials needed to take such complex and large language models into production. To build a GenAI app, you need several things:

  1. Access to a variety of foundation models - Examples include the Llama model or OpenAI’s models.
  2. A secure environment - Necessary for building and customizing domain-specific applications.
  3. A variety of tools - For training, fine-tuning, and deploying these models.
  4. Specialized infrastructure - Traditional infrastructure used in machine learning is often insufficient for handling large language models. A range of accelerators is needed to provide these intelligence capabilities to customers.
    I’ve color-coded these essentials based on the challenges we might face in each stage. Access to foundation models is less challenging, thanks to open-source initiatives from companies like Meta. Most companies also have their own AI environments for building applications, though they may need additional accelerators for fine-tuning these models. But having access to a variety of tools that enable you to build and deploy such models is more challenging. The size of these models is unprecedented, given the vast amount of internet-scale data they’re trained on.
    In our recent paper published in ACM, we noted that the open-source model code, such as the Llama model, only accounts for about 3-5% of what goes into building a complete GenAI application. The remaining 95% or more involves components around it—guardrails, accelerators, GPUs, and infrastructure needed to serve the model at scale and meet business requirements for latency and throughput.
    Currently, there are two ways companies are using GenAI. The first is the traditional “closed-book” approach, where the model is trained on internet-scale data and regularly updated. However, this method has several issues:
  5. Hallucination - The model might provide confident but incorrect answers.
  6. Attribution - It’s challenging to determine why the model generated a particular answer, which is crucial for high-stakes AI systems.
  7. Data Currency - The model can become outdated quickly, requiring frequent retraining to reflect new information and comply with regulations that allow users to opt out of data collection.
    For example, in fraud detection, it's essential to customize the model with specific terminology to improve accuracy. To address these issues, most companies now use the “open-book” approach, which attaches an external memory to the language model. Imagine an open-book exam, where you can answer questions with the book’s context at hand. This allows the model to use specific, context-rich prompts that improve factual accuracy and reduce hallucinations. For instance, if a user asks a question about Harry Potter, the model can refer to a specific Harry Potter book as part of its prompt, providing more reliable and domain-specific responses.
    Despite these advancements, there are many challenges in operationalizing such models. My team faces these challenges daily, as they account for more than 95% of what goes into building these systems.
    So, that’s all I have for you folks today. These are some of the resources my team in Dublin has been working on, so feel free to check them out. Thank you very much.
    Audience Question: Hallucinations are an obvious hurdle in the enterprise adoption of GenAI. Are there any immediate steps an organization can take to mitigate this problem?
    Speaker: Yes, the open-book approach is definitely what we’re using to reduce hallucination. Instead of using the model as-is, we provide domain-specific context as part of the prompt. So, if someone asks a question, a retriever component fetches domain-specific data from our corpus and attaches it to the prompt before it’s sent to the model. This ensures that the model formulates answers based on domain-specific data. There’s a paper called Augmentation Reduces Hallucination in Conversation that details how this open-book approach reduces hallucination.
    Audience Question: What are typical UX research tasks where we could benefit from using large language models?
    Speaker: I think it’s already there. I mentioned the instruction data and how it enhances user experience by allowing natural interaction with large language models. ChatGPT solved this by enabling users to ask the model to perform certain tasks, enhancing the user experience of interacting with it. But there’s still a lot of research around improving the relationship between this technology and human users.
    Audience Question: What’s the biggest challenge right now?
    Speaker: The biggest challenge is still hallucination. Although the open-book approach helps, it doesn’t eliminate the problem entirely. We’re working to reduce it to single-digit percentages by comparing GenAI output with domain-specific data before presenting it to customers. Statistical techniques can be used to ensure that there is minimal discrepancy between the model's output and actual data.
    Audience Question: What do you mean by hallucinations? It could mean many things.
    Speaker: Hallucination refers to the model generating false information confidently, as if it’s accurate. Sometimes, large language models provide information that sounds convincing but isn’t true, which can be misleading for users. Hallucination is essentially about confidently giving false information that doesn’t match actual data.
    Moderator: Awesome. Thank you very much, Ahmed! Let’s all give him a round of applause. Thank you!