Edge 264: Inside Muse: Google’s New Text-to-Image Super Model
The new generative AI model shows significant efficiency improvements over models like Stable Diffusion, Imagen and Parti.
Text-to-Image(TTI) models have been at the center of the generative AI revolution with models such as DALL-E, Stable Diffusion or Midjourney capturing the headlines. This explosion in high quality TTI models have been fundamentally powered by diffusion or autoregressive methods that can effectively compute similarities between text and images. The nascent nature of these architectures remain makes them relatively prohibited from a computational standpoint and there is still a lot of work that can be done to improve their efficiency and cost. Recently, Google unveiled Muse, a TTI model that can achieve state-of-the-art image quality outputs while remaining more efficient than diffusion and autoregressive models.
Muse follows Google’s active work in TTI with diffusion models such as Image or autoregressive models like Parti. Muse builds on the lessons learned in building both architectures to improve the computational efficiency while achieving the same level of image quality.