👁👂🏻Multi-Modal Learning is Becoming Real

Weekly news digest curated by the industry insiders

Jan 09, 2022

📝 Editorial

One of the cognitive marvels of the human brain is its ability to simultaneously process information from different sensorial inputs such as speech, touch, or vision in order to accomplish a specific task. Since we are babies, we learn to develop representations of the world based on many different modalities such as objects, sounds, verbal descriptions, and others. Recreating the ability to learn from different modalities simultaneously has long been a goal of ML, but most of those efforts remained constrained to research exercises. For decades, most supervised ML models have been highly optimized for a single representation of the information. That’s rapidly changing now. Multimodal ML is becoming a reality.

In the last two years, we have seen the emergence of multimodal ML models applied to real-world scenarios. Natural language and computer vision have been a powerful combination with the release of models such as OpenAI’s Dall-E or NVIDIA’s GauGAN. This week, Meta AI Research released a new model that combines audio and visual inputs to improve speech recognition. The model uses self-supervision techniques to analyze lip movements from unlabeled videos. That idea would have sounded insane a handful of years ago. While there are still plenty of milestones to reach in individual deep learning modalities, multimodal learning is an essential step towards the goal of building general AI. Little by little, such steps are making it more and more real.

🔺🔻 TheSequence Scope is our Sunday free digest. To receive high-quality educational content about the most relevant concepts, research papers, and developments in the ML world every Tuesday and Thursday, please subscribe to TheSequence Edge 🔺🔻

🗓 Next week in TheSequence Edge:

Edge#155: we discuss A/B Testing for ML Models; we explore how Meta AI uses ML A/B testing for improving its news feed ranking; we overview W&B, one of the top ML experimentations platforms on the market.

Edge#156: we deep dive into the ML mechanisms that power recruiting recommendations at LinkedIn

Now, let’s review the most important developments in the AI industry this week

🔎 ML Research

Audio-Visual Models for Speech Recognition

Meta AI Research (FAIR) published a paper proposing a technique that uses both audio and vision to better understand speech →read more on the FAIR team blog

Hidden Agenda

Researchers from DeepMind and Harvard University proposed Hidden Agenda, a two-team social deduction game used to help reinforcement learning agents develop cooperative mechanics →read more in the original research paper

Transformers and Semi-Supervised Learning for Video

Amazon Research published two papers about novel video intelligence techniques powered by transformers and self-supervised learning →read more on Amazon Research blog

Training Rescoring for Speech Recognition

Staying with Amazon Research: the tech giant published a paper proposing an NLU-based method to rescoring the training in speech recognition models used in the Alexa digital assistant →read more on Amazon Research blog

🤖 Cool AI Tech Releases

NVIDIA Canvas

Canvas, NVIDIA’s art generative toolset, got a few updates this week →read more on NVIDIA blog

NVIDIA Omniverse

NVIDIA Omniverse is a newly announced studio for creating virtual worlds →read more on NVIDIA blog

🛠 Real World ML

Notorious computer scientist Chip Huyen published a post detailing common challenges and solutions for real-time ML solutions →read more in Huyen’s original post

🐦 Follow us on Twitter where we share all our recommendations in bite-sized form

💸 Money in AI

AI&ML

Data labeling startup AIMMO raised $12 million in a Series A round.

AI-powered

AI and advanced analytics solutions provider Fractal raised $360 million in a funding round led by TPG Capital Asia. Hiring in the US and India.
AI and computer vision startup Avataar raised $45 million in Series B funding led by Tiger Global. Hiring in India.
Harmful narratives detection platform Pendulum raised a $5.9 million seed round led by Madrona Venture Group. Hiring in Seatle/US or remote.
Waste management platform RoadRunner Recycling raised $70 million in a Series D round led by BeyondNetZero. Hiring in Pittsburgh/US or remote.
Digital platform conductor ReadyWorks raised $8 million in a Series A financing round led by Credit Suisse Asset Management’s Next Investors.
Online biometric face authentication startup iProov raised $70 million in funding led by Sumeru Equity Partners. Hiring in the US and UK.
Bot mitigation and fraud detection company Human Security raised $100 million in a growth round of funding led by WestCap. Hiring in the US.

TheSequence