Text-to-Video Games and 1-Bit Models: Two Monumental Generative AI Research Milestones in One Week
Two papers that open new possibilities for generative AI.
Next Week in The Sequence:
Edge 375: Out series about reasoning in LLMs continues by exploring Meta’s recent work in System2 attention. We also review the Chainlit framework to build LLM applications.
Edge 376: We dive into the amazing SGLang framework created by UC Berkeley which provide significant performance gains in LLM inference.
You can subscribe below!
📝 Editorial: Text-to-Video Games and 1-Bit Models: Two Monumental Generative AI Research Milestones in One Week
Every week, there is an avalanche of research papers pioneering new techniques in generative AI, but only a tiny percentage of those papers contain contributions that are truly going to push the boundaries of the space. Last week was exceptional in terms of published papers, with two that could have a remarkable impact on the next few years of generative AI.
Text to Games with Genie
Google DeepMind continues to challenge our imagination when it comes to generative AI. Last week, the research lab unveiled Genie, a generative model that can create a playable 2D video game from a text description, a sketch, or a photo. What makes Genie remarkable is its ability to learn fine-grained controls while being trained solely on videos. This is remarkable because videos typically don’t include labels for actions being performed on them. Genie not only learns the actions from video sequences but also variations of these actions that are applicable to the same environment. Amazing!
Genie is in the super early stages, but its impact can be profound. From simulations and gaming to robotics, the ability to generate interactive environments can become one of the next frontiers for generative AI.
1-Bit LLMs
Computational and memory costs are some of the biggest roadblocks to the adoption of LLMs. Techniques such as quantization can improve inference time but often sacrifice accuracy. Recently, a team of researchers from Microsoft and the University of Chinese Academy of Sciences proposed an architecture called BitNet that uses an extreme form of quantization called a 1-bit model as a way to improve cost efficiency without sacrificing performance. Last week, the team doubled down and proposed a variant of the original BitNet called BitNet b1.58, which provides additional gains in cost-effectiveness, memory, latency, and throughput. BitNet b1.58 accomplishes this by using a structure that can represent the weights and parameters of the model using only 1.58 bits instead of the typical 16-bit representation of most LLMs.
The implications of BitNet b1.58 in generative AI can be quite significant. The new architecture can open the door to scaling the training and inference of LLMs using commodity hardware, and, if nothing else, the performance increases in current architectures should be notable.
Both Genie and the 1-Bit LLM represent major research milestones in areas that were deemed impossible a few months ago. The pace of research in generative AI is breathtaking. Amazing times.
Learn from top GenAI experts at GenAI Productionize 2024 – an industry-first summit on productionizing enterprise GenAI!
We're only a week away from LinkedIn, Google, Coinbase, Roblox, Comcast, Fidelity, Procter&Gamble, Chegg, LlamaIndex and more teaching how to get GenAI apps into production, including practical strategies for governance, evaluation, and monitoring.
🔎 ML Research
Genie
Google DeepMind published a paper introducing generative interactive environments(Genie), a model that can generate interactive playable environments from a single image prompt. Genie was trained on a dataset of 2D games and robotic videos and the approach seems quite generalizable to otehr domains —> Read more.
1-Bit LLMs
Microsoft Research published a paper proposing BitNet b1.58, a 1-bit LLM variant that uses 1.58 bits per parameter which leads to massive saves in computational and memory requirements without sacrificing performance. Differently from traditional 16 bit models, BitNet uses a {-1, 0, 1} ternary encoding for every weight and parameter which matches full-precision of 16 bit model —> Read more.
EMO
Alibaba Research published a paper detailing EMO, a framework for generating expressive videos from input audio and images. EMO combines a ReferenceNet network to extract features with a diffusion model to generate the final video frames —> Read more.
Finetuning and Scaling
Google DeepMind published a paper analyzing the effectiveness of fine-tuning methods relative to the scale of LLMs. The analysis covers both the effect of data and model size in finetunning algorithms —> Read more.
Generating Better Images with Hierarchical Prompts
Microsoft Research published a paper detailing a technique to enhance images created by visual language models using hierarchical prompts. The method creates detailed graphs of image decriptions which are using to generate more detailed images —> Read more.
🤖 Cool AI Tech Releases
Mistral Large
Mistral announced its biggest model so far, Mistral Large, which matches the performance of GPT-4 across several benchmarks —> Read more.
Le Chat
Mistral also unveiled Le Chat, a ChatGPT competitors built on their foundation models —> Read more.
Samba-1
NVIDIA competitor SambaNova released Samba-1, a one trillion parameter model optimized for enterprise scenarios —> Read more.
StarCoder2
BigCode released StarCoder2 , an open source code generation LLM —> Read more.
🛠 Real World ML
AI-Assisted Development at Pinterest
Pinterest dicusses lessons learned and best practices about enabling AI-assisted development processes —> Read more.
AI Code Generation at GitHub
GitHub shares some insights and best practices about AI code generation —> Read more.
📡AI Radar
Elon Musk is suing OpenAI for breaching its original non-profit founding mission.
Humanoid robot Figure AI raised $675 million at an astonishing $2.6 billion valuation.
Semantic Search platform Glean raised a monster $200 million round.
AI hardware sensation Groq announced the acquisition of Definitive Intelligence and launched its GroqCloud solution.
AI image generation startup Ideogram announced a new $80 million raise.
Snowflake CEO’s Frank Slootman announced he is retiring and will be succeeded by Sridhar Ramaswamy CEO of AI platform Neeva.
AI app platform FlowGPT announced a $10 million pre-Series A round.
AI photo editing platform Photoroom raised $43 million in a new round of funding.
AI publishing platform Inkitt raised a $37 million Series C.
H2O AI released Danube, a 1.8B parameter model optimized for IOT.
GitHub’s CoPilot Enterprise hit general availability.
Adobe announced new AI tools for audio creation and edition.
Email startup Superhuman released AI-powered instant replies.
AI news reader Particle raised $4.4 million and launched a private beta.