🍱 The Text-to-Image Synthesis Revolution

Weekly news digest curated by the industry insiders

Aug 21, 2022

📝 Editorial

Next week, we will start a new series about text-to-image synthesis models. In the last year, this deep learning discipline has seen an astonishing level of progress. You probably heard about OpenAI DALL-E 2, but plenty of other impressive text-to-image generation models have been created in the last few months. We have seen Google coming up with models like Imagen and Parti; Meta has done amazing work with Make-A-Scene; OpenAI created GLIDE and, of course, DALL-E 2. All these models push the boundaries of text-to-image synthesis in ways that challenge human imagination. However, the innovation is not only coming from the big AI labs but also from startups in the space. MidJourney is one of the text-to-images synthesis models created by a relatively small startup; it shows artistic qualities quite often superior to models created by big AI incumbents. Just this week, AI startup Stability AI released a new model known as Stable Diffusion, which shows an impressive performance.

The text-to-image synthesis revolution has been catalyzed by the progress in language models over the last few years. The fascinating thing about text-to-image synthesis is that it immediately appeals to graphic artists and mainstream audiences. Art is the most important materialization of human creativity and imagination and, for years, has been considered one of the boundaries between machine and human intelligence. Now text-to-image synthesis models are crossing those boundaries, trying to offer visual proofs to spark the debate of whether AI can show creativity and imagination. Regardless, it is pretty clear that, these days, text-to-image synthesis has surpassed natural language understanding as the field dominates the headlines in AI. The next few months will likely bring fascinating developments to this nascent field in AI.

🔺🔻TheSequence Scope – our Sunday edition with the industry’s development overview – is free. To receive high-quality content about the most relevant developments in the ML world every Tuesday and Thursday, please subscribe to TheSequence Edge 🔺🔻

🗓 Next week in TheSequence Edge:

Edge#219: we start the new series about text-to-image models; discuss CLIP, a neural network that can learn image representations while being trained using natural language datasets; explore Hugging Face’s CLIP implementation.

Edge#220: we deep dive into Meta AI’s Make-A-Scene, which pushes the boundaries of AI art synthesis.

Now, let’s review the most important developments in the AI industry this week

🔎 ML Research

AI Agent Agency

DeepMind published a fascinating paper that describes a causal modeling method to understand an incentive in AI agents better and explains how to tailor the training based on that knowledge →read more

Distributed GNN Training

Amazon Research published a paper proposing a distributed training approach for graph neural networks →read more

Language for Robots

Google Research published a paper proposing a model that leverages advanced language models, which allow robots to follow instructions in the physical world →read more

Hyperparameter Tuning and Transformers

Google Research published a paper detailing OptFormer, the first hyperparameter optimization method targeted to transformer models →read more

✏️ Data Labeling Survey

How to work with data properly when preparing it? What are the best labeling methods and tools for ML solutions today? We keep learning from the experience gained by engineers and entrepreneurs behind the leading data labeling solutions, Toloka, Superb AI, Label Studio, and more.

Please take a simple survey to help us prepare an article about data labeling. It will take about 2-3 minutes.

TAKE THE SURVEY

🤖 Cool AI Tech Releases

Stable Diffusion

AI startup Stability AI launched Stable Diffusion, a text-to-image synthesis model based on latent diffusion techniques →read more

Cloudera Data Lakehouse

Cloudera announced the release of CDP One, a data lake as a service solution with integrated storage, computation and ML capabilities →read more

New TorchVision APIs

PyTorch added new APIs to its TorchVision framework for listing and initializing models and weights →read more

🛠 Real World ML

NY Times Paywall

The NY Times unveils some ML details it uses to make its paywall smarter →read more

💸 Money in AI

Data processors provider Pliops raised a $100 million Series D funding round led by Koch Disruptive Technologies (KDT). Hiring in Israel.
Conversational AI startup Modulate raised $30 million in a Series A funding round led by Lakestar. Hiring in Cambridge, MA/US.
AGI startup Keen Technologies raised a $20 million round, led by Nat Friedman and Daniel Gross.
AIOps company BigPanda raised $20 million in an extension of its Series E round, with contributions from UBS Next and Wells Fargo Strategic Capital. Hiring in Israel, the US, and Europe.
Cloud infrastructure optimization company Sync Computing raised $15.5 million in Series A funding led by Costanoa Ventures. Hiring remote.
Customer-facing analytics service Explo raised $12 million in Series A led by Craft Ventures. Hiring in San Francisco and New York/US.

TheSequence