⚪️🔵 Edge#82: Fiddler is Bringing ML Monitoring to Enterprises

Apr 22, 2021

What’s New in AI, a deep dive into one of the freshest research papers or technology frameworks that are worth your attention. Our goal is to keep you up to date with new developments in AI in a way that complements the concepts we are debating in other editions of our newsletter.

💥 What’s New in AI: Fiddler is Bringing Machine Learning Monitoring to Enterprises

Monitoring and explainability are some of the toughest challenges faced in real-world machine learning implementations. The constituency and lifecycle of machine learning programs are fundamentally different from other software technologies. As a result, most of the monitoring, debugging, and interpretability tools that we use in traditional software architecture are relatively useless when it comes to machine learning. To that, we need to add that modern deep neural networks remain a black box for most data scientists, which, sometimes, hinders its adoption in mission-critical systems. Solving monitoring and interpretability remains a pivotal challenge for the mainstream adoption of machine learning solutions, especially in enterprise environments. Among the platforms tackling this important challenge, Fiddler AI stands out as one of the most complete and innovative technology stacks in the market.

Each relevant architecture in the history of the software industry has sparked the creation of monitoring platforms that accompanied its evolution. The networking and distributed computing era powered companies such as Computer Associates and BMC, which dominated the application performance monitoring (APM) landscape for decades until the emergence of the cloud space, in which the baton was passed to companies like New Relic and App Dynamics. Machine learning takes this challenge to a new level, given that we are not only talking about a new runtime architecture but a completely different structure for the programs living in that architecture.

The Challenges of Machine Learning Performance Monitoring

Effective monitoring of ML models remains a tough challenge that becomes especially distinct when working with machine learning systems in the real world. While there are plenty of challenges associated with machine learning model monitoring and explainability, most of them can be summarized in the following categories:

Biases: Machine learning models are a natural mechanism for amplifying data bias. Quantifying the impact of bias in the outputs of machine learning models is far from trivial.

Real-Time Model Performance Without Labels: Explaining performance metrics during the training of machine learning models is relatively easy as everything can be reconciled back to labeled datasets. That picture looks very different once the models are deployed into production, and they need to operate against unlabeled datasets.

Real-Time Performance Metric Calculations: Quantifying performance metrics of machine learning models is relatively expensive from a computational standpoint. As a result, it is difficult to maintain a real-time performance view of machine learning models.

Auto-Retraining: A prevalent way to address the performance decay in machine learning models is to train a new version with an updated dataset and deploy it side by side with the current model. That process is certainly effective but presents a challenge from the model monitoring standpoint, as the performance metric of the new model can be drastically different from that of its previous version.

Accuracy vs. Interpretability: Finally, one of the most famous dilemmas in the current machine learning ecosystem is the friction between model accuracy and interpretability. In general, models that are relatively easy to interpret might not be super accurate, while more accurate, complex deep learning models prove to be incredibly difficult to monitor and explain.

Addressing these challenges requires monitoring and explainability platforms that are highly tailored to the machine learning space. Adapting traditional APM stacks to monitor machine learning models has proven to be a largely fruitless effort. Instead, a new generation of companies has emerged to enable monitoring and explainability as a first-class component of machine learning programs. The evolution of APM capabilities into machine learning works has come to be known as model performance monitoring (MPM) and has become one of the pillars of the MLOps movement. Conceptually, MPM enables the key building blocks to track and monitor the end-to-end lifecycle of machine learning models.

Some of the principles of MPM are beautifully outlined in “Introducing Model Performance Management,” authored by the Fiddler team. Fiddler has been one of the pioneers in the MPM space, adapting many of its concepts to modern machine learning stacks.

The Fiddler Platform

Fiddler is one of the undisputed early leaders in the machine learning monitoring space. The Fiddler platform enables a foundational set of capabilities to streamline the interpretability and monitoring of machine learning models. One of the key differentiators of Fiddler is that it does not take a static view. Instead, it enables visibility and interpretability across the various stages of the lifecycle of machine learning models, from training to deployment.