Inside LlaVA: The Very Popular Open Source Alternative to GPT-4V

The model outperforms GPT-4 in several visual instruction tasks.

Nov 23, 2023

∙ Paid

A creative interpretation of a llama with red volcanic skin, symbolizing a powerful AI model. The llama is wearing glasses, emphasizing its intelligence, and is intently typing on a laptop, suggesting it's coding artificial intelligence software. The scene has a modern, technological ambiance, with digital elements subtly incorporated in the background to highlight the coding theme. The llama's posture should reflect concentration and expertise. — Created Using DALL-E

Today, we celebrate the Thanksgiving holiday in the United States when it is customary to give thanks for our blessings during the last year. I wanted to take a moment to express my gratitude for your support of this newsletter. Writing this amount of deeply technical content on a weekly basis is not easy, particularly considering I have operational responsibilities in three other companies. I do it because I believe it is a small contribution to raising awareness about new AI research and technology, but also because I am fortunate to have a very engaging, intellectually curious, and technically rigorous audience that ensures we maintain a high standard for this newsletter and makes it really enjoyable.

For that, thank you.

Happy Thanksgiving.

JR

Now onto today’s edition:

A few weeks ago, OpenAI unveiled new image and audio processing capabilities in GPT-4. Fundamentally, the AI lab announced a new model known as GPT-4 Vision(GPT-4V), which allows users to instruct GPT-4 on image and audio inputs. GPT-4V is an interesting development in the multimodal foundation model space. A few days after the GPT-4V announcements, we already had the first open-source alternative. Researchers from the University of Wisconsin-Madison and Microsoft Research introduced Large Language and Vision Assistant (LLaVA), a LLaMA-based multimodal LLM that can process image and audio data as input.

TheSequence

Inside LlaVA: The Very Popular Open Source Alternative to GPT-4V

The model outperforms GPT-4 in several visual instruction tasks.

For that, thank you.

Happy Thanksgiving.

JR

This post is for paid subscribers