Some Non-Obvious Points About OpenAI 01
Plus some major funding rounds by World Labs and Glean , Mistral's new release and more.
Next Week in The Sequence:
Edge 431: Our series about space state models(SSMs) continues with an overview of multimodal SSMs. We discuss the Cobra SSM multimodal model and NVIDIA’s TensorRT-LLM framework.
Edge 432: Dives into NVIDIA’s Minitron models distilled from Llama 3.1.
You can subscribe to The Sequence below:
📝 Editorial: Some Non-Obvious Points About OpenAI 01
The release of OpenAI’s new model dominated headlines this week. The o1 models are specialized in reasoning and planning, areas that have long been of interest to OpenAI. Much of the debate in online circles has focused on the model’s specific capabilities, such as whether the terms "reasoning" and "thinking" are appropriate, so there is plenty of content discussing that. Instead of contributing to the debate, I wanted to highlight a few key points that I found particularly interesting while reading the o1 technical report.
It seems that the o1 models were trained and fine-tuned using different methodologies compared to their predecessors. Specifically, OpenAI used reinforcement learning optimized for chain of thought (CoT) scenarios, which is somewhat unique.
Initial results indicate that this reinforcement learning for CoT technique can scale significantly, potentially leading to new breakthroughs in reasoning and planning.
Only CoT summaries, rather than complete CoT traces, are available via the API, making it difficult to determine how the model arrives at specific outputs.
Somewhat paradoxically, CoT-focused models might lower the entry point for interpretability since we are starting with a baseline of reasoning traces.
One of the most interesting aspects of o1 is the shift from training to inference compute time. Inference, rather than training, is increasingly becoming a key requirement for complex reasoning tasks. The reasoning core doesn’t necessarily need to be a large model, which could translate into decreases in training time. We will need to see how this strategy evolves over time.
This point makes me think we might be witnessing the start of a new set of scaling laws focused on inference.
The red-teaming efforts for o1, with companies such as Apollo Research and Haize Labs, are quite impressive and worth diving into in the technical report.
Unsurprisingly, o1 is much harder to jailbreak than previous models, and it spends much more time on inference. That said, there have already been several successful jailbreak attempts.
OpenAI o1 clearly shows that reasoning is one of the next frontiers of foundation model research and, more importantly, that improvements in foundation model architectures are not stalling—they may just take some time to materialize.
🔎 ML Research
LLMs for Novel Research Ideas
AI researchers from Stanford University published a study about the research ideation capabilities of LLMs. The experiment draws a comparison between human- and LLM generated ideas across different nove fields. The results might surprise you —> Read more.
Agent Workflow Memory
Researchers from MIT and Carnegie Mellon University published a paper introducing Agent Workflow Memory(AWM), a method for reusable tasks workflows in agents. AWM, introduces reusable tasks to agents so that they can be used to guide future actions —> Read more.
Modular LLMs
Researchers from Princeton University, Carnegie Mellon University , Tsinghua University, UCLA and several other AI labs published a paper proposing a modular design for LLMs. Specifically, the paper introduces the term of “brick” to define a functional block within an LLM and highlights the efficiencies of following this composable approch for LLM construction —> Read more.
Better Math Agents
Google DeepMind published a paper introducing a preference learning framework to optimize the performance of math AI models. The framework uses techniques such as multi-turn and tool-integrated reasoning to improve the efficiency of single-turn math models —> Read more.
WINDOWSAGENTARENA
Researchers from Microsoft, Columbia University and Carnegie Mellon University published a paper detailing WINDOWSAGENTARENA, an environment for evaluating agents in tasks in the Windows OS. The environment includes over 150 diverse tasks that requires capabilites such as screen understanding, tool usage and planning —> Read more.
LLaMA-Omni
Researchers from several elite chinese AI labs published a paper proposing LLaMA-Omni, an architecture for integrating speech interactions with open source LLMs. LLaMA-Omni integrates a pretrained speech encoder, a speech adapter and a streaming speech decoder with an LLM such as LLaMA in order to process text and speech data simulataneously —> Read more.
🤖 AI Tech Releases
OpenAI o1
OpenAI released a new family of models specialized in reasoning —> Read more.
AgentForce
Salesforce unveiled AgentForce, its platform for autonomous AI agents —> Read more.
DataGemma
Google open sourced DataGemma, a series of small models grounded in factual data —> Read more.
Pixtral 12B
Mistral released Pixtral 12B, its first multimodal model for images and text —> Read more.
🛠 Real World AI
AI for Coding at Salesforce
Salesforce discusses CodeGenie, an internal tool used to boost developer productivity using generative AI —> Read more.
Data Center Cooling at Meta
Meta discusses the reinforcement learning techniques used for cooling optimization in their data centers —> Read more.
📡AI Radar
AI pioneer Fei-Fei Li’s company World Labs raised another $230 million.
AI-search platform Glean raised $260 million in a Series E.
OpenAI is rumoured to be raising a new round at a $150 billion valuation.
Google co-founder Sergey Brin gave a rare interview about his recent work on AI.
Arcee AI released its SuperNova 70B model.
AI agent platform Landbase came out of stealth with $12.5 million in funding.
InMobi secured $100 million for AI acquisition ahead of its IPO.
AI bookkeeping startup Finally raised $200 million.
Stability AI and Lenovo partnered for text-to-image capabilities.
AI translation platform Smartcat raised $43 million.
ServiceNow unveiled a series of AI agents for customer service, procurement, HR and others.
OffDeal announced a $4.7 million round to improve M&A for small businesses.
AI-powered compliance platform Datricks raised $15 million in a new round.