TheSequence

Share this post
Edge 258: Inside OpenAI's Point-E: The New Foundation Model Able to Generate 3D Representations from Language
thesequence.substack.com

Edge 258: Inside OpenAI's Point-E: The New Foundation Model Able to Generate 3D Representations from Language

The new model combines GLIDE with image-to-3D generation models is a very clever and efficient architecture.

Jan 5
17
Share this post
Edge 258: Inside OpenAI's Point-E: The New Foundation Model Able to Generate 3D Representations from Language
thesequence.substack.com

Generative AI and foundation models are dominating the headlines in the deep learning space. Text-to-Image models such as DALL-E, Stable Diffusion or Midjourney have captured a tremendous momentum in terms of adoption. 3D and video seem to be the next frontier for multimodal generative models. OpenAI have been actively working in the space and quietly unveiled Point-E, a new text-to-3D model that is able to generate 3D point clouds from natural language inputs.

3D is a particularly challenging domain for generative AI models. Compared to image or even video, 3D datasets are seldomly available. Additionally, 3D generation is more than shape and includes other aspects such as texture or orientation which are hard to capture in text representation. As a result, traditional supervised methods based on text-3D pairs face incredible limitations in terms of scalability. Pretrained models have been somewhat successful overcoming some of the limitations of supervised models and is precisesly the path followed by OpenAI.

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2023 Jesus Rodriguez, Ksenia Semenova
Privacy ∙ Terms ∙ Collection notice
Start WritingGet the app
Substack is the home for great writing