🗄 A Model Compression Library You Need to Know About
Weekly news digest curated by the industry insiders
The machine learning (ML) space is currently dominated by large models that often have computation requirements impossible for most organizations. Model compression is one of the disciplines that has been targeting that challenge by creating smaller models without sacrificing accuracy. Despite the obvious need, model compression remains a challenge for ML engineering teams as most frameworks in the space are relatively nascent. As a result, you rarely hear about ML engineering pipelines that incorporate model compression as a native building block. Quite the opposite, model compression tends to be one of those things that you only consider once the problem is too big to ignore; literally 😉
Last week, Microsoft Research open-sourced a new framework that attempts to streamline compression in deep learning models. DeepSpeed Compression is part of the DeepSpeed platform aimed to address the challenges of large-scale AI systems. The framework provides a catalog of common model compression techniques abstracted using a consistent programming model. The initial experiments showed up to 32x compression rates in large transformer architectures such as BERT. If DeepSpeed Compression follows the path to other frameworks in the DeepSpeed family, it could be productized as part of the Azure ML platform and streamline the adoption of compression methods in deep learning architectures. DeepSpeed Compression is definitely a framework to follow by the ML engineering community.
🔺🔻TheSequence Scope – our Sunday edition with the industry’s development overview – is free. To receive high-quality content about the most relevant developments in the ML world every Tuesday and Thursday, please subscribe to TheSequence Edge 🔺🔻
🗓 Next week in TheSequence Edge:
Edge#211: we discuss what to test in ML models; explain how Meta uses A/B testing to improve Facebook’s newsfeed algorithm; explore Meta’s Ax, a framework for A/B testing in PyTorch.
Edge#212: we dive deep inside the Masterful CLI Trainer, a low-code CV model development platform.
Now, let’s review the most important developments in the AI industry this week
🔎 ML Research
Generalist Reinforcement Learning Agents
Google Research published a paper unveiling a generalist reinforcement learning agent that can play many video games simultaneously →read more on Google Research blog
Outlier Root Cause Analysis
Amazon Research published a paper outlining a technique to detect the root causes of statistical outliers →read more on Amazon Research blog
Salesforce Research published a paper and open-sourced code for CodeRL, a reinforcement learning framework for program synthesis →read more on Salesforce Research blog
The Algorithms Behind Transformers
DeepMind published a research paper detailing the algorithms and mathematical foundations of transformer architectures →read more in the original research paper from DeepMind
☝️ We Recommend – Join this webinar and discover the Hopsworks 3.0 release!
In this talk, Hopsworks VP of engineering will explore new capabilities in Hopsworks feature store 3.0 and how it can help data scientists who love Python to manage their features for training and serving models. He will also native Python support for feature engineering, feature pipelines, feature views that represent models in the feature store, transformation functions, and data validation with Great Expectations. Join us on Aug 3, at 7 PM CEST.
🤖 Cool AI Tech Releases
Microsoft Research open-sourced DeepSpeed Compression, a framework for compression and system optimization in deep learning models →read more on Microsoft Research blog
OpenAI expanded the availability of DALL-E to over a million people on the waitlist →read more on OpenAI blog
New Tools and Frameworks for Alexa
Amazon unveiled a series of new developer frameworks and tools for Alexa that improve developers’ and device makers’ experience →read more on Amazon Developer blog
PyTorch open-sourced the PlayTorch app to streamline the development of mobile AI experiences →read more on PyTorch blog
🛠 Real World ML
Out of Memory Predictions at Netflix
Netflix discusses the architecture powering ML models used to predict memory capacity errors in TVs and set-top boxes →read more on Netflix tech blog