I Promise, this Editorial is NOT About OpenAI
Some major milestones in generative video were announced this week.
Next Week in The Sequence:
Edge 345: Our series about fine-tuning finally dives into reinforcement learning with human feedback(RLHF). We review the original RLHF paper and the Transformers RL library.
Edge 346: We deep dive into Llava, the first open source GPT-4V alternative.
You can subscribe below:
📝 Editorial: I Promise, this Editorial is NOT About OpenAI
I intentionally plan to avoid discussing the recent events at OpenAI in this weekend's editorial. There are plenty of other AI newsletters on the planet offering various opinionated takes, even without all the facts. I prefer to wait until more information is available before forming an opinion.
Here are just a couple of key points:
I will never support the public humiliation of an entrepreneur without any clear proof of major wrongdoing.
OpenAI now has a novice management team and board, and it's evident. Among other issues, blindsiding Microsoft regarding these events was hardly a wise decision.
In last week's editorial, I mentioned that OpenAI aspired to be like Apple in 2008. Perhaps a more accurate comparison would be Apple in 1985…
Before the drama unfolded Friday, I had written today's editorial about the progress in generative video. Let's keep it brief 😉.
Long considered one of the most challenging areas for generative AI, video creation is quickly becoming a new frontier in the field. Generative video models must integrate concepts such as movement, physical reactions, time alignment, and interactions between objects, which are not required in traditional image scenarios. Additionally, the number of video datasets is relatively small compared to those for text, images, or audio. Not surprisingly, the video space has lagged behind other generative AI domains. But this is rapidly changing.
The volume and quality of research in generative video are swiftly increasing. Just this week, Meta and Google published new work in this area. Meta AI unveiled their advancements in Emu Video and Emu Edit, marking significant milestones in generative video. Emu Video is a high-quality text-to-video model that generates images from a text prompt and then short videos based on both the text and the images. Emu Edit is an image editing model capable of transforming images based on textual instructions, suitable for both global and local edits.
Also this week, Google Research released a paper on Mirasol3B, a model for the multimodal understanding of long-form videos. Mirasol3B consists of two autoregressive models that infer information from different modalities such as video, audio, or text present in long-form videos. Initial results show Mirasol3B achieving new milestones in video question-answering benchmarks.
Video is emerging as one of the new frontiers in generative AI. Ironically, this is an area where OpenAI has not particularly excelled.
🔎 ML Research
Emu Video and Emu Edit
Meta AI published papers outlining Emu Video and Emu Edit which represents their latest research in generative video generation and edition respectively. Both models are based on Emu, Meta AI’s first image generation model —> Read more.
Long Video Understanding
Google Research published a paper proposing Mirasol3B, a multimodal model that can learn long forms of text, audio and video. The main innovation of Mirasol3B is that it decouples the learning into different autoregressive models which allow higher levels of specialization —> Read more.
Optimizing Models for Different Hardware
Amazon Science published a detailed analysis of the techniques used to optimize neural architecture search(NAS) models across different hardware. The process includes aspects such as curating the search space and incorporating human feedback —> Read more.
Weather Forecasting
Google DeepMind published a paper detailing GraphCast, a weather forecasting model. GraphCast is able to predict weather conditions up to 10 days in advance beating the state-of-the-art models in both accuracy and cost —> Read more.
Ghostbuster
Berkeley AI Research(BAIR) published a paper proposing Ghostbuster, a techique for detecting AI generated content. Ghostbuster uses LLMs to determine the probability of generating each token in a document and then combines those results in a final classifier —> Read more.
🤖 Cool AI Tech Releases
Lyria
Google DeepMind and Youtube collaborated on building Lyria, an advanced music generation model as well as a set of music AI tools —> Read more.
Microsoft AI Releases
Microsoft announced numerous AI releases at its Ignite conference —> Read more.
LlamaIndex 0.9
The new release of LlamaIndex is here with quite a group of new feature —> Read more.
NVIDIA AI Foundry Service
NVIDIA announced the release of the Foundry family of foundation models in partnership with Microsoft Azure —> Read more.
🛠 Real World ML
Getting Started with Llama 2
Meta AI published an step by step process to get started with Llama 2 —> Read more.
📡AI Radar
In shocking events, the OpenAI board, in a very public and humilliating way, decided to fire Sam Altman.
Microsoft announced Maia and Cobalt, two new chips optimized for generative AI workloads.
Kyutai, a non-profit French AI labs, has raised over $300 million from legendary investors.
Menlo Ventures announced it has raised $1.35 billion to invest in a new generation of AI startups.
AI customer support platform Siena announced $4.7 million in new funding.
CreateSafe, the AI music platform associated with Grimes, raised $4.6 million.
Martian, a platform that prioneers model routing for price optimization, raised $9 million.
SunnySide, an AI digital health platform announced an $11.5 million series A.
AI code generation platform CodeGen raised $16 million in new funding.
Tech Spark AI raised $1.4 million to work on a ChatGPT alternative.
AI content marketplace Civitai raised $5.1 million in a new round.
3D generative AI platform Atlas came out of stealth mode with $6 million in funding.
AI content experimentation platform OfferFit raised $25 million in a new round.