📝 Editorial
Generative models based on textual inputs are experiencing tremendous momentum. Models such as DALL-E, Midjourney, and Stable Diffusion have captured the imagination of not only the AI community but artists, designers, gamers, and creative minds across many different domains. When thinking about the next milestone for text-to-image synthesis models, video creation is often cited on the top of the list. Obviously, video generation presents significant challenges compared to static images. For starters, video requires significantly more training resources, and there are very few high-quality datasets available that works with supervised methods. Also, the feature representation space of videos is considerably more complex than images. Just like text-to-image, recently text-to-video has turned to unsupervised pretrained methods. A few days ago, Meta AI took a very important step in advancing text-to-video synthesis with the unveiling of Make-A-Video, a model able to generate high-quality videos from textual inputs.
Make-A-Video follows the announcement of Make-A-Scene, a photorealistic text-to-image synthesis model. Make-A-Video learns the correspondence between text, visual, and movement from unsupervised video data. Arguably, the biggest contribution of Make-A-Video is the fact that the model doesn’t require text-video pairs for its training. Just from processing large amounts of video content, Make-A-Video is able to infer how different objects move and interact. Part of this innovation comes from leveraging text-image priors. Meta AI didn’t release a version of Make-A-Video as its still understanding the ethical concerns around these type of models, but the website indicates that a limited release might be available soon. Make-A-Video is an indication that a new wave of text-to-video synthesis models is around the corner.
🔺🔻TheSequence Scope – our Sunday edition with the industry’s development overview – is free. To receive high-quality content about the most relevant developments in the ML world every Tuesday and Thursday, please subscribe to TheSequence Edge 🔺🔻
🗓 Next week in TheSequence Edge:
Edge#231: we explore Text-to-image synthesis with GANs; discuss Google’s XMC-GAN, a modern approach to text-to-image synthesis; explore NVIDIA GauGAN2 Demo.
Edge#232: we deep dive into DeepMind’s new method for discovering when an agent is present in a system
📌 Feature Store Summit 2022: A free conference on Feature Engineering
We highly recommend the second Feature Store Summit, a free online conference for Feature Engineering and managing data for AI organized by Hopsworks. Join them October 11th, 2022!
This year's talks and sessions revolve around the theme of 'Accelerating Production Machine Learning with Feature Stores' from companies such as Uber, Linkedin, Airbnb, Doordash, Disney Streaming and many more. By joining the event you will be able to hear from people who have seen the good, bad and ugly side of the feature stores and learn from their experiences. It will help you to understand the capabilities of a Feature Store and the various cutting-edge technologies that facilitate bringing ML models into production, as well as showcase ways to improve your ML platforms.
Now, let’s review the most important developments in the AI industry this week
🔎 ML Research
Make-A-Video
Meta AI published a paper detailing Make-A-Video, a text-to-video synthesis model that can produce short, high-quality video clips from textual inputs →read more
Fast and Sustainable Reinforcement Learning
Google Research published a paper unveiling ActorQ, a method for accelerating the training and efficiency of RL agents →read more
Alexa’s Interactive Story Creation
Amazon Science published a detailed article about the ML techniques powering Alexa’s new interactive story creation features →read more
AI Systems Performance Evaluation
Microsoft Research published an article detailing the techniques and best practices used to evaluate the performance of the AI systems powering PeopleLens, a solution behind social interaction for blind children →read more
🤖 Cool AI Tech Releases
BigCode
Hugging Face and ServiceNow Research partnered to launch BigCode, a project that aims to build large language models for coding →read more
SetFit
Hugging Face and Intel Labs open-sourced SetFit, a framework for few-shot fine-tuning of Sentence Transformers →read more
PySyTFF
TensorFlow and OpenMined collaborated to launch PySyTFF, a new framework for privacy-preserving ML →read more
Venice
LinkedIn open sourced Venice, a derived data platform for high-throughput, low latency datasets →read more
Freely Available DALL-E
OpenAI enabled access to DALL-E without a waitlist →read more
🛠 Real World ML
Trillion Parameter Scalability at AWS
Amazon Science discusses the techniques and architecture used to scale the training of large ML models to over one trillion parameters →read more
Data Warehousing at Airbnb
Airbnb discusses the architecture and processes used to upgrade their data warehousing infrastructure →read more
💸 Money in AI
ML&AI
AI performance company Arthur raised a $42 million Series B funding round co-led by Acrew Capital and Greycroft Ventures. Hiring in the US.
Edge AI innovator Femtosense raised an $8 million Series A funding round led by Fine Structures Ventures.
AI-powered
Data exchange Flatfile raised a $50 million Series B funding round led by Tiger Global. Hiring remote.
‘Virtual ward’ startup Doccla raised a $17 million Series A funding round led by General Catalyst. Hiring in Sweden.
Climate tech company EverestLabs raised a $16.1 million Series A funding round led by Translink Capital. Hiring in Fremont, CA/US.
AI solutions for manufacturing provider Invisible AI raised a $15 million Series A funding round led by Van Tuyl Companies (VTC). Hiring remote.
Adtech startup Lunio raised a $15 million Series A round led by Smedvig Capital. Hiring remote.
GPT-3 powered content platform Regie.ai raised $10 million in Series A funding round led by Scale Venture Partners.