Edge 404: Learn About Meta AI's Promising Technique to Predict Multiple Tokens at the Same Time in LLMs
The mehod addresses the limitations of the classic next token prediction method.
Next token prediction has long been associated with the magic of LLMs. The idea that models trained to “predict the next word” could show such incredible capabilities at scale is nothing short of formidable. However, next token prediction also represents one of the major limitations of LLMs. Scale is one of the obvious limitation of the next toke prediction model as there is only so much you can do one token at a time. Probably the more interesting limitation is related to cost as LLMs assign the same computation costs to tokens regardless of their importance for a given task. Finally, single-token prediction methods predominantly captures local patterns and neglects complex decision-making, requiring significantly more data than human children to achieve comparable fluency levels. Recently, a collaboration from researchers from Meta AI Research, CERMICS, Ecole des Ponts ParisTech and LISN Université Paris-Saclay produced a paper proposing a technique for multi-token prediction which aims to address some of the aforementioned limitations in LLMs.