Speeding up the machine learning lifecycle to get more from your data

Mark McQuade, Daniel Quach

iconography showing machine learning brain inside a spinning circle

 

Businesses are realizing the value of using machine learning models to drive better outcomes. Harnessing the predictive power of your data with machine learning models is becoming more critical to business operations, yet 60% of machine learning models never make it to production. Where is it all going wrong?

 

Widespread struggles with AI and machine learning

We conducted a global study in December 2020 and January 2021 on AI and machine learning adoption, usage, benefits, impact and future plans. The study surveyed 1,870 IT leaders in various industries across the Americas, Europe, Asia and the Middle East. The study revealed that the majority of respondents (82%) are still exploring how to implement AI or struggling to operationalize AI and machine learning models.

The research also showed that, on average, companies have four AI and machine learning R&D projects in place and we know from speaking to customers that most organizations are investing in research and development into model development. However, the disconnect between the operations or data ops teams and the machine learning engineers or data science teams means that many of the models never make it to production. There are often issues around deployment, automation and scalability of machine learning models.

 

The challenges of operationalizing machine learning models

Data science teams often face challenges in how they manage models as they pass through different stages of the machine learning workflow. Getting machine learning models swiftly from a development environment to production is not an area of expertise for data scientists. A DevOps or infrastructure team would be better equipped to deliver on the reproducibility of models and predictions. It can be difficult to reproduce a model’s output when moving it from one environment to another as it requires careful tracking of library versions, data sets, diagnostics, performance monitoring and model drift.

Another common problem is that models tend to multiply into different environments and become difficult to keep a track of. Data scientists create domain-specific models and run many experiments, first starting in a development environment, and then moving them along the chain into a testing environment. This results in multiple models running simultaneously across different environments, using different data sets and different hyperparameters. So this makes it almost impossible to track a model's lineage. One of the most important aspects of governance and regulatory compliance (especially if you're dealing with any kind of auditors) is tracking and explaining everything your model is doing or has done.

 

DevOps is not enough

The DevOps culture and application lifecycle management have become a standard in the IT industry over the last decade. It emerged to fill the gap between an organization's ability to develop application code and the way to efficiently deploy, test, scale, monitor and update workloads. Mature CI/CD pipeline needs are largely addressed in application development by standardized tools and best practices that are already in place.

Unlike application development, where quality comes from the code itself, the quality of a machine learning model comes largely from the data features used to train it. The importance of these data features cannot be understated as their quality drives your machine learning model’s performance. And it's worth mentioning that machine learning models are still in their operational infancy.

Additionally, data might change daily, and data that was used for predictions that you did for today might be significantly different from the data used for model training a month ago. In this case, the production model needs to be retrained and go back into the development phase. So as a result, a machine learning model’s lifecycle is significantly different from an application lifecycle. We had a customer in the fraud space who wanted to push production models every 24 hours to account for new threats. The customer would retrain and redeploy their model every day to be able to account for any drift in data. That's impossible to do without a mature solution in place.

 

Introducing the Model Factory Framework

The machine learning lifecycle is complex. There are many steps to an entire machine learning lifecycle such as data ingestion, data analysis, data transformation, data validation, data splitting, model building, model training and model validation. And with all these steps there are associated challenges. This is why we developed the Rackspace Technology Model Factory Framework.

The Model Factory Framework is built on AWS, using open source tools that enable rapid development, training, scoring and deployment models. The Model Factory Framework was built to address any problems you face when taking machine learning models from development to production.

The Model Factory Framework simplifies the whole machine learning lifecycle — which usually has over 25 steps and can take months — to 10 or so steps which can be completed within a matter of weeks.

 

Learn more about the Model Factory Framework

If you would like to learn about the Rackspace Technology Model Factory Framework in more detail and explore how it improves processes — from model development to deployment, monitoring and governance — view our webinar, “Automating Production Level ML Operations on AWS.” In this webinar we'll cover:

  • Introduction to MLOps Foundations powered by Model Factory
  • The gap between the Data Scientists and ML Operations
  • The distinction between MLOps and DevOps
  • Architecture patterns necessary for elements of effective MLOps
  • How a “model factory” architecture holistically addresses CI/CD for ML

 

Automating Production Level ML Operations on AWS