👷♀️🧑🏻🎓👩💻👨🏻🏫 The MoE Momentum
Weekly news digest curated by the industry insiders
Massively large neural networks seem to be the pattern to follow these days in the deep learning space. The size and complexity of deep learning models are reaching unimaginable levels, particularly in models that try to master multiple tasks. Such large models are not only difficult to understand but incredibly challenging to train and run without incurring significant computational expenses. In recent years, Mixture of experts (MoE) has emerged as one of the most efficient techniques to build and train large multi-task models. While MoE is not necessarily a novel ML technique, it has certainly experienced a renaissance with the rapid emergence of massively large deep learning models.
Conceptually, MoE is rooted in the simple idea of decomposing a large multi-task network into smaller expert networks that can master an individual task. This might sound similar to ensemble learning, but the big difference is that MoE models execute one expert network at any given time. The greatest benefit of MoE models is that their computation costs scale sub-linearly with respect to their size. As a result, MoE has become one of the most adopted architectures for large-scale models. Just this week, Microsoft and Google Research published papers outlining techniques to improve the scalability of MoE models. As big ML models continue to dominate the deep learning space, MoE techniques are likely to become more mainstream in real-world ML solutions.
🔺🔻 TheSequence Scope is our Sunday free digest. To receive high-quality educational content about the most relevant concepts, research papers, and developments in the ML world every Tuesday and Thursday, please subscribe to TheSequence Edge 🔺🔻
🗓 Next week in TheSequence Edge:
Edge#159: we recap our MLOPs series (two parts!);
Edge#160: we deep dive into Aporia, an ML Observability platform.
Now, let’s review the most important developments in the AI industry this week
🔎 ML Research
Meta (Facebook) AI Research (FAIR) published a paper unveiling data2vec, a self-supervised learning model that mastered speech, language, and computer vision tasks →read more on FAIR blog
MoE Task Routing
Google Research published a paper introducing TaskMoE, a technique to extract smaller, more efficient subnetworks from large multi-task models based on Mixture of experts (MoE) architectures →read more on Google Research blog
DeepSpeed and MoE
Microsoft Research published a very detailed blog post detailing how to use its DeepSpeed framework to scale the training of Mixture of experts (MoE) models →read more on Microsoft Research blog
StylEx – Visual Interpretability of Classifiers
Google Research published a paper proposing StylEx, a method to visualize the influence that individual attributes have on the output of ML classifiers →read more on Google Research blog
🤖 Cool AI Tech Releases
The Allen Institute for AI (AI2) open-sourced a demo solution that compares its Macaw model against OpenAI’s GPT-3 →read more on AI2 blog
🛠 Real World ML
AI Fairness at LinkedIn
The LinkedIn engineering team published some details about how they integrate fairness as a first-class citizen of its AI products →read more on LinkedIn Engineering blog
🐦 Useful Tweet
💸 Money in AI
Healthcare customer support platform BirchAI (a spinout from the Allen Institute for AI (AI2), our long-term partner) raised $3.1 million in seed financing led by Radical Ventures. You can read our interview with BirchAI CTO here. Hiring in Seattle/US.