In this issue:
we discuss Midjourney, one of the most enigmatic models in the space;Â
we explore Microsoft’s LAFITE, which can train text-to-image synthesis models without any text data;Â
we explain Disco Diffusion, an important open-source implementation of diffusion models. Â
Enjoy the learning! Â
💡 ML Concept of the Day: What is Midjourney?Â
Continuing our series about text-to-image synthesis, today we would like to discuss one of the most enigmatic models in the space. Midjourney has quickly become one of the most impressive text-to-image models ever created, showing results that are equally or more impressive than alternatives like DALL-E, Imagen or Stable Diffusion. By enigmatic, we are referring to the fact that, despite its popularity, very little has been published about the deep learning techniques powering Midjourney. Â
Midjourney was created by an AI research lab of the same name led by David Holz, creator of Leap Motion and former researcher at NASA. Very little is known about the method powering Midjourney except that it seems to have been pretrained in billions of images and inspired by models like CLIP. The model appears to be running on an infrastructure that surpasses 10,000 servers. The main interface to interact with Midjourney is a Discord bot which can be used on private and public servers. The bot receives a series of input commands that allows it to customize the output. Commands include processing the text input, customizing the style, adjusting the quality and configuring prompt preferences. Â
One astonishing thing about Midjourney is the photorealistic quality of the generated images. This is one of the areas in which Midjourney contrasts with alternative models. Each version of Midjourney has regularly improved over the photorealistic qualities of the output. Â
The quality is so impressive that some images created using Midjourney have gone to win art competitions. The following piece