Edge 272: Inside Toolformer, Meta AI New Transformer Learned to Use Tools to Produce Better Answers
The model mastered the use of tools such as calculators, calendars, or Wikipedia search queries across many downstream tasks.
Today’s large language models have made remarkable strides in performing a range of natural language processing tasks, displaying a range of emergent capabilities. However, these models have certain inherent limitations that can only be partially mitigated by increasing their size. These limitations include an inability to access recent events, a tendency to fabricate information, difficulties in processing low-resource languages, a lack of mathematical proficiency, and an ignorance of the passage of time. One promising approach to overcome these limitations is to equip language models with the ability to use external tools such as search engines, calculators, or calendars. However, current solutions either require extensive human annotations or are restricted to specific tasks, hindering wider adoption. A few days ago, Meta AI published a research paper detailing Toolformer, a novel model that learns to use tools in a self-supervised manner without the need for human annotations.
Meta AI’s approach with Toolformer is based on the concept of in-context learning and the generation of datasets from scratch. Given just a few examples of how an API can be used, Toolformer annotates a large language modeling dataset with potential API calls. Through a self-supervised loss, the model determines which API calls are useful in predicting future tokens and fine-tunes itself accordingly. With this approach, language models can learn to control a variety of tools and to make informed decisions on when and how to use them. Toolformer allows the model to retain its generality and to independently decide when and how to use various tools, enabling a more comprehensive utilization of tools that is not tied to specific tasks.
Inside the Toolformer Architecture
The core idea behind Toolformer is to enhance a language model (M) with the ability to use different tools via API calls. The inputs and outputs for each API are represented as text sequences, which enables the integration of API calls into any text using special tokens.
For the training, Meta AI used a dataset of API calls represented as a tuple (ac, ic), where ac is the name of the API, and it is the input. Given an API call (ac, ic) with a corresponding result (r), the linearized sequences of the API call without and with the result are denoted as e(ac, ic) and e(ac, ic, r), respectively. The dataset is the first step to convert the dataset of plain texts into an augmented dataset by inserting API calls. This is done in three steps: sampling potential API calls, executing the API calls and filtering the API calls based on their usefulness in predicting future tokens.
After filtering the API calls, they are merged and interleaved with the original inputs to form the augmented dataset. The language model is then finetuned on this augmented dataset, allowing it to make its own decisions on when and how to use each tool based on its own feedback.
In the inference stage, the model generates text as usual until it encounters the “!” token, indicating the need for an API response. The appropriate API is then called to obtain the response, and the decoding process continues after inserting the response and the </API> token.
The researchers are investigating various tools to address the limitations of regular language models (LMs). The only requirements for these tools are that their inputs and outputs can be represented as text sequences and that the researchers can obtain a few examples of how to use them. The five tools being explored are a question-answering system, a Wikipedia search engine, a calculator, a calendar, and a machine translation system.
1. Question Answering System: The question-answering system is based on another LM that can answer simple factual questions.
2. Calculator: The calculator can perform basic arithmetic operations and returns results rounded to two decimal places.
3. Wikipedia Search: The Wikipedia search engine returns short text snippets from Wikipedia based on a search term.
4. Machine Translation: the machine translation system can translate phrases from any language into English.
5. Calendar: The calendar returns the current date without taking any input, providing a temporal context for predictions that require an awareness of time.
The Toolformer implementation is based on a finetuned version of GPT-J, which only uses 6.7 billion parameters. The model was able to outperform GPT-3 and GPT-J across several benchmarks.
The ideas behind Toolformer represent a new frontier for LLMs in which they are not only able to perform sophisticated language tasks but complement them with access to tools and APIs. Can’t wait to see Meta AI expand on these ideas.