Anthropic, WOW
New models, an agent that can interact with your computer and a new code generation tool.
Next Week in The Sequence:
Edge 443: We close our series about state space models and announce a new and exciting series.
The Sequence Chat: Will provide a perspective of transformer models as a computer.
Edge 444: We dive into Meta AI’s amazing Movie Gen model.
You can subscribe to The Sequence below:
A small self-serving note before we start 😉:
For the past year, I’ve been working on several ideas in AI evaluation and benchmarking—an area that, as many of you know, presents a massive challenge in today’s AI landscape. After experimenting with various approaches, I decided to incubate LayerLens, a new AI company focused on streamlining the evaluation and benchmarking of foundation models. This marks my third venture-backed AI project in the last 18 months. We've assembled a phenomenal team, with experience at companies like Google, Microsoft, and Cisco, as well as top universities. We’ve also raised a sizable pre-seed round. More details about that in the next few weeks.
We are currently hiring across the board, particularly for roles in AI research and engineering with a focus on benchmarking and evaluation. If you’re interested in this space and looking for a new challenge, feel free to reach out to me at jr@layerlens.ai. I look forward to hearing from some of you!
Now, onto today’s editorial:
📝 Editorial: Anthropic, WOW
What a week for Anthropic. The AI powerhouse announced a wave of exciting new releases, signaling a significant leap forward in AI capabilities. The highlight is undoubtedly the introduction of "computer use," a feature that allows their AI model, Claude, to interact with computers much like a human user would. Claude can now interpret on-screen information, move the cursor, click, and type, opening up a vast array of potential applications previously inaccessible to AI systems. This feature is currently in public beta, available through the Anthropic API, Amazon Bedrock, and Google Cloud's Vertex AI.
This advancement in computer use builds upon Anthropic's previous work in tool use and multimodality, enabling Claude to seamlessly interpret screen images and execute tasks using available software tools. The training process involved teaching Claude to accurately count pixels to control cursor movement, a crucial skill for precise mouse commands. Remarkably, Claude demonstrated rapid generalization from training on basic software like calculators and text editors, showcasing its ability to translate user prompts into a series of logical steps and actions on the computer.
In addition to computer use, Anthropic has also released upgraded versions of its existing models. Claude 3.5 Sonnet, the model capable of computer use, has received substantial enhancements, boasting significant performance gains in coding and tool use tasks. Notably, it has achieved industry-leading results on coding benchmarks, surpassing even specialized systems designed for such tasks.
Furthermore, Anthropic is introducing Claude 3.5 Haiku, a new model designed for speed and affordability. It delivers performance comparable to Claude 3 Opus, their previous largest model, at a significantly lower cost and with similar speed to the previous generation of Haiku8. Claude 3.5 Haiku excels in coding tasks and boasts low latency, making it well-suited for user-facing applications and situations requiring rapid processing of large data volumes.
Complementing these model upgrades, Anthropic has also introduced a new "analysis tool" in Claude.ai. This tool empowers Claude to write and execute JavaScript code, enabling it to perform data analysis, generate insights, and even create visualizations. Think of it as a built-in code sandbox that allows Claude to perform complex calculations and manipulate data, leading to more precise and reproducible answers.
These new capabilities signal Anthropic’s aspirations to get into the agents space at a monumental scale. All in all, a remarkable week of releases for Anthropic.
🔎 ML Research
PANGEA
Researchers from Carnegie Mellon University published a paper introducing PANGEA, a multilingual-multimodal LLM supporting 39 languages. The research also includes PANGEABEANCH, a benchmark encompassing 14 datasets in 47 languages —> Read more.
Meta Research Artifacts
Meta AI published the research and open source artifacts behind several models including Segment Anything 2.1. The release also includes Spirit LM, a model for speech and text integration —> Read more.
Controllable Safety Alignment
Microsoft Research and Johns Hopkins University published a paper proposing Controllable Safety Alignment (CoSA), a framework designed to adapt LLMs to different safety constraints without retraining. CoSA allows models to follow safety instructions in natural language —> Read more.
CoT and Vision-Language Models
Researchers from Apple and Carnegie Mellon University published a paper showcasing the impact of CoT in visual language models(VLMs). The paper uses a technique that distills CoT traces from LLMs and uses those to fine-tune VLMs —> Read more.
BLIP-3-Video
Salesforce Research published a paper introducing xGen-MM-Vid (BLIP-3-Video), a multimodal LLM for video. xGen-MM-Vid uses techniques such as temporal encoders and visual tokenizers to capture temporal information over multiple frames —> Read more.
Sabotage Evaluations
Anthropic published a research paper introducing Sabotage Evaluation for frontier models. These evaluations quantify the ability of a foundation model to subvert human oversight on specific contexts —> Read more.
🤖 AI Tech Releases
Claude
Anthropic released an upgraded version Claude 3.5 Sonnet, and a new model, Claude 3.5 Haiku —> Read more.
Claude Computer use
The latest version of Claude can take actions in computer environments —> Read more.
Quantized Llama
Meta released two quantized versions of Llama 3.2 with 1B and 3B parameters respectively —> Read more.
Stable Diffusion 3.5
Stability AI open sourced a new version of its marquee text- to- image model —> Read more.
AutoTrain
HuggingFace open sourced AutoTrain, a framework for training LLMs with a few clicks —> Read more.
IBM Granite
IBM released Granite, a family of models optimized for enterprise workloads —> Read more.
🛠 Real World AI
Recommendations at Amazon
Amazon explores the ML techniques used to remove bias in recommendations —> Read more.
📡AI Radar
There are rumors that OpenAI will release its next big model before the end of the year.
Microsoft released a new wave of AI agents for its Dynamics 365 CRM platform.
Runway showcased a preview of Act-One, a new tool for generating expressive characters.
Humanoid robotics startup Agility is closing a $150 million round.
Apple released an API for its upcoming Apple Intelligence features.
Ideogram introduced Canvas, a new interface for inpainting and outpainting capabilities .
AI notepad app Granola raised a $20 million Series A.
Agentic banking platform interface.ai raised $30 million in new funding.
Cohere released multimodal embeddings.
Asana announced its AI Studio for automating repetitive tasks.
Midjourney announced a new image editor.
Neysa, a cloud AI platform, raised $30 million in new funding.
Pharos raised $5 million to use AI in medical reporting.