ChatGPT and Whisper APIs
Sundays, The Sequence Scope brings a summary of the most important research papers, technology releases and VC funding deals in the artificial intelligence space.
Next Week in The Sequence
Edge #271 discusses a taxonomy for understanding federated learning models, Meta AI’s research for building faster and more scalable federated learning systems and Microsoft’s FLUTE federated learning framework.
Edge #272 will explore Toolformer, Meta AI’s new transformer model that learned to use tools such as calendar, calculators or Wikipedia searches to master complex tasks.
📝 Editorial
It was literally impossible to select a different topic for this week’s editorial 😉. The hype around ChatGPT is tremendous but OpenAI is doing a masterful job capturing the momentum with regular product releases and business milestones. This week, the AI powerhouse unveiled the first version of the ChatGPT and Whisper API which enable more robust experiencing for integrating conversational and audio based intelligent experiences in applications.
The ChatGPT API has to be one of the most anticipated releases by the developer community. The API is represented by the gpt-3.5-turbo model which enables developers to develop chat/conversational experiences based on sequences of messages instead of single messages like previous experiences. gpt-3.5-turbo is also the most efficient model for non-chat use cases and the cost is cheaper than its predecessors.
OpenAI also unveiled the first API of Whisper, its famous speech recognition model that was released in 2022. The API itself focuses on speech-to-text translation and transcription scenarios. Combining Whisper with the GPT-3.5 capabilities opens the door to super interesting dual-modal audio-language scenarios.
In addition to the Whisper and ChatGPT API releases, OpenAI announced the availability of dedicated instances for its API via the Azure cloud. The core idea is to enable large organizations to have more control over the interaction with OpenAI models which maintaining a consistent developer experience.
With last week’s releases, the OpenAI API now includes language, image and audio models making it an incredibly complete offer for developers.
🔎 ML Research
Transformers for Robot Navigation
Google Research published a paper that presents a robot navigation technique using model predictive controller and transformers —> Read more.
Robotic Ad Click Detection
Amazon Science published a paper introducing SLIce , a technique for detecting whether clicks on ads are produced by robots or humans —> Read more.
Morally Self-Correcting LLMs
Anthropic published a research paper proposing three studies that highlight some RLHF techniques that can help LLMs to morally-self-correct —> Read more.
Composer
Alibaba Research published a paper proposing Composer, a 5B parameter text-to-image diffusion model optimized for controllability —> Read more.
🤖 Cool AI Tech Releases
ChatGPT and Whisper APIs
OpenAI released the first API version of its ChatGPT and Whisper API as well as dedicated instance infrastructure —> Read more.
Anthropic Early Apps
OpenAI competitor Anthropic started enabling access to its ChatGPT competitor to a select group of startups —> Read more.
📡AI Radar
Generative AI startup TypeFace came out of stealth mode with a $65 million fundraise.
Stability AI is reported to be raising a new round of funding at $4 billion valuation.
Korean startup Indent raised $8.1 million for its customer video review tool.
Eleuther AI announced the EleutherAI Institute, a new nonprofit organization focused on advancing foundation model research and development.
Qwak raised $12 million to advance its MLOps platform.
SESAMm raised a $35 million series B2 for its sentiment analysis platform.
Alphabet’s self driving car Waymo laid off 200 people.