TheSequence

TheSequence

Share this post

TheSequence
TheSequence
Edge 258: Inside OpenAI's Point-E: The New Foundation Model Able to Generate 3D Representations from Language

Edge 258: Inside OpenAI's Point-E: The New Foundation Model Able to Generate 3D Representations from Language

The new model combines GLIDE with image-to-3D generation models is a very clever and efficient architecture.

Jan 05, 2023
∙ Paid
17

Share this post

TheSequence
TheSequence
Edge 258: Inside OpenAI's Point-E: The New Foundation Model Able to Generate 3D Representations from Language
Share

Generative AI and foundation models are dominating the headlines in the deep learning space. Text-to-Image models such as DALL-E, Stable Diffusion or Midjourney have captured a tremendous momentum in terms of adoption. 3D and video seem to be the next frontier for multimodal generative models. OpenAI have been actively working in the space and quietly unveiled Point-E, a new text-to-3D model that is able to generate 3D point clouds from natural language inputs.

3D is a particularly challenging domain for generative AI models. Compared to image or even video, 3D datasets are seldomly available. Additionally, 3D generation is more than shape and includes other aspects such as texture or orientation which are hard to capture in text representation. As a result, traditional supervised methods based on text-3D pairs face incredible limitations in terms of scalability. Pretrained models have been somewhat successful overcoming some of the limitations of supervised models and is precisesly the path followed by OpenAI.

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 Jesus Rodriguez
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share