đ± Distributed ML Training is the Problem Everyone is Going to Have
The Scope covers the most relevant ML papers, real-world ML use cases, cool tech releases, and $ in AI. Weekly.
đ EditorialÂ
Large-scale, distributed training is one of those machine learning (ML) problems that is easy to ignore. After all, only large AI labs like Google, Facebook, and Microsoft work with these massively large models that require many GPUs to be trained. I definitely thought that way until the transformers came into the picture. If there is one takeaway from the emergence of transformer models, it is that bigger models are better, at least for the time being. Training a basic BERT-based transformer model requires quite a bit of infrastructure and distributed processes. As a result, distributed training is slowly becoming a mainstream problem for the entire AI community. Â
As someone who didnât care much about distributed ML training, I followed the research peripherally without getting into the details. This changed in the last couple of years when I started playing with larger and larger models. The level of research and engineering built-in distributed ML training frameworks is mind-blowing. Frameworks like Horovod and Ray are certainly better known, but the innovation doesnât stop there. Just this week, Microsoft open-sourced some new additions to its DeepSpeed distributed training library. At the same time, Facebook and Tencent published very advanced research to scale the distributed training of transformer models. Innovation in this space will certainly continue in the next few years, and, at this point, distributed training should be considered a key building block of any modern ML pipeline. Â
đșđ»TheSequence Scope â our Sunday edition with the industryâs development overview â is free. To receive high-quality content about the most relevant developments in the ML world every Tuesday and Thursday, please subscribe to TheSequence Edge đșđ»
đ Next week in TheSequence Edge:
Edge#117: we discuss how transformers expand beyond natural language processing (NLP) into computer vision scenarios; talk about ImageGPT, OpenAIâs adaptation of their GPT model to computer vision scenarios; explore the Hugging Face library, one of the few frameworks that includes transformer models for computer vision.Â
Edge#118: we overview WhyLabs â an end-to-end AI observability and monitoring platform that enables transparency across the different stages of ML pipelines.Â
Now, letâs review the most important developments in the AI industry this week
đ ML Research
PipeTransformer: Scaling Distributed TrainingÂ
Researchers from Facebook, Tencent, and the University of Southern California published a paper proposing PipeTransformer, a PyTorch-based framework for elastic distributed training of transformer models ->read more on PyTorch blog
Causality in Time Series Datasets Â
Amazon research published a paper detailing a method for detecting causal features in a time series dataset ->read more on Amazon Research blog
Tracing Cell LineageÂ
IBM Research published a paper discussing ML methods that can be used to reconstruct cell lineage trees ->read more on IBM Research blog
đ Â Real World ML
Managing Big Data Hardware Resources at Uber
The Uber engineering team published an analysis of the infrastructure and processes used to manage hardware resources in their big data and AI solutions ->read more on Uber blog
đ€ Cool AI Tech Releases
DeepSpeed MoE
Microsoft Research introduced DeepSpeed mixture of experts (MoE), an addition to the DeepSpeed library that enables the training of massively large MoE models ->read more on Microsoft Research blog
TensorFlow Lite MoveNet
TensorFlow unveiled a version of its MoveNet pose detection library optimized for TensorFlow Lite ->read more on TensorFlow blog
đŹ Useful Tweet
Quanta Magazine is always an amazing read
đž Money in AI
For ML and dev teams:
GraphQLÂ company Apollo raised $130 million in a Series D round at a more than $1.5 billion valuation. Insight Partners led the round. Hiring.
Code search and navigation tool Sourcegraph raised $125 million in Series D funding round at a valuation of $2.6 billion. Andreessen Horowitz led the round. Hiring remotely.
Data observability startup Monte Carlo raised $60 million in funding led by Iconiq Growth. Hiring across all teams.
Data exploration and visualization platform Preset raised $35.9 million in Series B funding led by Redpoint Ventures. Hiring in California, US.
Open-source business intelligence company Metabase raised a $30 million Series B round led by Insight Partners. Many remote job openings.
AI-powered:
Predictive airfare and hotel rate platform Hopper raised $175 million in a Series G round led by GPI Capital. Hiring across the globe.
Call center automation startup Balto raised $37.5 million in a Series B funding round led by Stripes. Hiring in Missouri, US, or remote.
Robotic machine operator creator Rapid Robotics raised $36.7 million in a Series B round co-led by Kleiner Perkins and Tiger Global. Hiring in San Francisco, US.
Insurance verification startup TrustLayer raised a $15.1 million Series A round led by Craft Ventures. Hiring in Tampa, LA, or remote.
Design tool Uizard raised $15 million in funding from Insight Partners. Hiring remotely. Â
The platform for automating enterprise compliance Regology raised $8 million in Series A led by Acme Capital.
Health tech AI startup Cardiomatics raised a $3.2 million seed round led by Central and Eastern European VC Kaya. Hiring in Poland.
Machine learning tech company SwoopTalent was acquired by SAP. The companies agreed not to disclose the purchase price or other financial details of this transaction.