🗄 A Model Compression Library You Need to Know About

Weekly news digest curated by the industry insiders

Jul 24, 2022

📝 Editorial

The machine learning (ML) space is currently dominated by large models that often have computation requirements impossible for most organizations. Model compression is one of the disciplines that has been targeting that challenge by creating smaller models without sacrificing accuracy. Despite the obvious need, model compression remains a challenge for ML engineering teams as most frameworks in the space are relatively nascent. As a result, you rarely hear about ML engineering pipelines that incorporate model compression as a native building block. Quite the opposite, model compression tends to be one of those things that you only consider once the problem is too big to ignore; literally 😉

Last week, Microsoft Research open-sourced a new framework that attempts to streamline compression in deep learning models. DeepSpeed Compression is part of the DeepSpeed platform aimed to address the challenges of large-scale AI systems. The framework provides a catalog of common model compression techniques abstracted using a consistent programming model. The initial experiments showed up to 32x compression rates in large transformer architectures such as BERT. If DeepSpeed Compression follows the path to other frameworks in the DeepSpeed family, it could be productized as part of the Azure ML platform and streamline the adoption of compression methods in deep learning architectures. DeepSpeed Compression is definitely a framework to follow by the ML engineering community.

🔺🔻TheSequence Scope – our Sunday edition with the industry’s development overview – is free. To receive high-quality content about the most relevant developments in the ML world every Tuesday and Thursday, please subscribe to TheSequence Edge 🔺🔻

🗓 Next week in TheSequence Edge:

Edge#211: we discuss what to test in ML models; explain how Meta uses A/B testing to improve Facebook’s newsfeed algorithm; explore Meta’s Ax, a framework for A/B testing in PyTorch.

Edge#212: we dive deep inside the Masterful CLI Trainer, a low-code CV model development platform.

Now, let’s review the most important developments in the AI industry this week

🔎 ML Research

Generalist Reinforcement Learning Agents

Google Research published a paper unveiling a generalist reinforcement learning agent that can play many video games simultaneously →read more on Google Research blog

Outlier Root Cause Analysis

Amazon Research published a paper outlining a technique to detect the root causes of statistical outliers →read more on Amazon Research blog

CodeRL

Salesforce Research published a paper and open-sourced code for CodeRL, a reinforcement learning framework for program synthesis →read more on Salesforce Research blog

The Algorithms Behind Transformers

DeepMind published a research paper detailing the algorithms and mathematical foundations of transformer architectures →read more in the original research paper from DeepMind

☝️ We Recommend – Join this webinar and discover the Hopsworks 3.0 release!

In this talk, Hopsworks VP of engineering will explore new capabilities in Hopsworks feature store 3.0 and how it can help data scientists who love Python to manage their features for training and serving models. He will also native Python support for feature engineering, feature pipelines, feature views that represent models in the feature store, transformation functions, and data validation with Great Expectations. Join us on Aug 3, at 7 PM CEST.

🤖 Cool AI Tech Releases

DeepSpeed Compression

Microsoft Research open-sourced DeepSpeed Compression, a framework for compression and system optimization in deep learning models →read more on Microsoft Research blog

DALL-E Beta

OpenAI expanded the availability of DALL-E to over a million people on the waitlist →read more on OpenAI blog

New Tools and Frameworks for Alexa

Amazon unveiled a series of new developer frameworks and tools for Alexa that improve developers’ and device makers’ experience →read more on Amazon Developer blog

PlayTorch App

PyTorch open-sourced the PlayTorch app to streamline the development of mobile AI experiences →read more on PyTorch blog

🛠 Real World ML

Out of Memory Predictions at Netflix

Netflix discusses the architecture powering ML models used to predict memory capacity errors in TVs and set-top boxes →read more on Netflix tech blog

💸 Money in AI

Digital experience analytics company Contentsquare raised a $600 million in the Series F growth investment round led by Sixth Street Growth. Hiring across the globe.
Surgical intelligence company Theator raised $24 million in an extension of its Series A funding round led by Insight Partners. Hiring in Tel Aviv/Israel and Palo Alto/US.
AI/ML security startup HiddenLayer emerged from stealth after raising $6 million in seed funding led by Ten Eleven Ventures. Hiring remote.
Graph database company Dgraph raised a $6 million seed re-financing round led by Venrock and Uncorrelated Ventures.
Healthcare data analysis startup Cornerstone AI raised a $5 million seed round led by Healthy Ventures.

TheSequence