Fuyu-8B Makes the Case for Simple, Fast, and Powerful Generative AI Models

Sundays, The Sequence Scope brings a summary of the most important research papers, technology releases and VC funding deals in the artificial intelligence space.

Oct 22, 2023

Next Week in The Sequence:

Edge 337: Our series about fine-tuning in foundation models explores QLoRA including its original paper. We also dive into the fine-tuning tools for the Azure Open AI service.
Edge 338: We review Google DeepMind’s WebAgent, an instruction-tuned LLM that can complete tasks in wesites.

You can subscribe below:

📝 Editorial: Fuyu-8B Makes the Case for Simple, Fast, and Powerful Generative AI Models

Fuyu-8B Makes the Case for Simple, Fast, and Powerful Generative AI Models. In the never-ending stream of news about generative AI every week, it's challenging to pinpoint what's genuinely significant. Last week saw numerous intriguing developments across the board. Still, what stood out to me was the relatively understated release of one of the most captivating multimodal foundation models in recent times: Fuyu-8B.

Fuyu-8B is a streamlined version of the model powering the Adept.AI platform. Adept is a prominent player in the generative AI domain, having raised over $415 million at a valuation exceeding $1 billion. The platform is dedicated to constructing agents that comprehend high-level objectives and convert them into actions, relying primarily on computer vision and language. ACT-1, dubbed the "transformer for actions," is the force behind Adept. Fuyu-8B is its smaller, open-source counterpart.

What sets Fuyu-8B apart?

Initially, its architecture is tailor-made for digital agent scenarios. This specialization allows Fuyu to excel in areas such as answering questions from graphs or understanding concepts across varying image resolutions. Diving deeper into its technicalities, the most remarkable aspect of Fuyu-8B is its architectural simplicity. The model employs a standard decoder framework without a specialized image encoder. While this makes it more comprehensible compared to other multimodal designs, it also leads to substantial performance enhancements. In layman's terms: Fuyu-8B is multimodal, straightforward, and swift.

Fuyu-8B stands out, not as just another generalist model, but one that's being actively refined for powering digital agents—a rising trend in generative AI (a space I'm personally involved in 😉). Fuyu-8B represents an interesting development in open-source generative AI, potentially inspiring novel multimodal designs that are both simple and powerful.

🗓️ Join Meta, PepsiCo, RiotGames, Uber & More at apply(ops)

What do HelloFresh, Lidl Digital, Meta, PepsiCo, Pinterest, Prima, Remitly, Riot Games & Uber have in common?

They’ll all be presenting at apply(ops) on Tuesday, November 14, on how they deploy production ML! Databricks’ CEO Ali Ghodsi will also be joining Tecton’s CEO Mike Del Balso for a fireside chat about LLMs, real-time ML, and other trends in ML.

SAVE MY SPOT

🔎 ML Research

Fuyu-8B

Generative AI startup Adept AI open source Fuyu-8B, the first public version of the model behind its copilot platform. Fuyu-8B is a multimodal model that uses a decoder-only transformer architecture without an image decoder —> Read more.

Trustworthiness in GPT Models

Microsoft Research published an assessment of trustworthiness in GPT models. The study evaluates different vectors of trustworthiness such as toxicity, privacy, adversarial robustness and many others → Read more.

Decoding Images from Brain Activity

Meta AI published a paper detailing an AI architecture able to reconstruct images from brain activity. This method could represent an important milestone towards understanding how images are represented in the brain —> Read more.

Batch Calibration in LLMs

Google Research published a paper detailing a new calibration method for in-context-learning(ICL) in LLMs. This type of methods are typically used mitigate performance degradation in ICL scenarios based on bias and other factors —> Read more.

Ethical Risks of Gen AI

Google DeepMind published a paper discussing the social and ethical risks of AI systesms. The paper proposes a framework for evaluating different risk dimensions such as human interactions or systemic impacts in specific contexts —> Read more.

🤖 Cool AI Tech Releases

TensorRT-LLM

NVIDIA open sourced TensorRT-LLM, a framework to accelerate the perfromance of LLMs on NVIDIA GPUs —> Read more.

🛠 Real World ML

NYT Recipe recommendations

The New York Times discusses the ML algorithms used for personalized recipe recommendations —> Read more.

Anomaly Detection at Pinterest

Pinterest discusses the architecture that allows them to plugin different anomaly detection algorithms into their platform —> Read more.

📡AI Radar

Chinese AI startup Zhipu raised $340 million from a group of investors that includes Tencent and Alibaba.
OpenAI made DALL-E 3 available in ChatGPT and its enterprise edution.
Baidu announced Ernie 4.0, the latest version of its marquee LLM.
AI assistant startup Luzia closed $10 million in a new funding round.
Overstory raised $14 million to develop AI capabilities that prevent wildfires.
Square announced a new batch of generative AI features.
Nirvana raised $57 million in new funding to use AI for commercial insurance policies.
AI music generation platform Riffusion raised $4 million in new funding.
Creative Force announced an $8.9 million series A to use generative AI for e-commerce content generation.
Deepfake detection platform Reality Defender announced $15 million in new funding.
Layer announced $3 million seed round for its copilot platform.
Jasper unveiled a new copilot for marketing teams.

TheSequence

Discussion about this post

Ready for more?