TheSequence

TheSequence

Share this post

TheSequence
TheSequence
Edge 252: Another Foundation Super Model: Google’s DreamFusion Can Convert Text to 3D

Edge 252: Another Foundation Super Model: Google’s DreamFusion Can Convert Text to 3D

Another breakthrough in generative AI. DreamFusion uses diffusion models to generage 3D objects.

Dec 15, 2022
∙ Paid
20

Share this post

TheSequence
TheSequence
Edge 252: Another Foundation Super Model: Google’s DreamFusion Can Convert Text to 3D
Share

On Thursdays, we dive deep into one of the newest research papers or technology frameworks that is worth your attention. Our goal is to keep you up to date with new developments in AI to complement the concepts we debate in other editions of our newsletter.

Generative AI has been enjoying an impressive renaissance fundamentally triggered by the emergence of diffusion architectures. DALL-E 2, Midjourney, Stable Diffusion, Imagen are some of the diffusion-based models that are reaching impressive milestones in areas such as text-to-image or text-to-video. Text-to-3D is often mentioned as one of the next frontier for diffusion techniques but the path is not so trivial. Recently, Google unveiled DreamFusion, a diffusion based neural network that is able to generate realistic 3D representations from text inputs.

Diffusion architectures allow these models to be pretrained on monumentally large volumes of unlabeled text and image collections. Extrapolating that approach to 3D is far from an easy endeavor as there aren’t many large datasets of 3D data. Also, the whole diffusion model is based on denoising and reconstructing images but can you imagine the complexity of doing something like that for a 3D object?

Enter DreamFusion

With DreamFusion, Google circumvents some of the known limitations of diffusion models when applied to 3D data by using a pretrained 2D text-to-image model to perform 3D synthesis. More specifically, DreamFusion uses Google’s own Imagen as its text-to-image foundation. The architecture also includes a technique called Score Distillation Sampling (SDS) that can generate samples in a 3D parameter space by optimizing a loss function. Another component that DreamFusion relies heavily on is the neural radiance field(NeRF) which is a super complex technique that can generate 3D scenes from partial 2D images.

Putting all these components together, the DreamFusion algorithm works in the following steps:

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 Jesus Rodriguez
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share