Edge 342: Who's Happy Potter? Inside One of the Most Fascinating Papers Published This Year
Microsoft Research’s details a fine-tuning method for unlearning concepts in LLMs
Large language models(LLMs) are regularly trained in vast amounts of unlabeled data, which often leads to acquiring knowledge of incredibly diverse subjects. The datasets used in the pretraining of LLMs often including copyrighted material, triggering both legal and ethical concerns for developers, users, and original content creators. Quite often, specific knowledge from LLMs is required to be removed in order to adapt it to a specific domain. While the learning in LLMs is certainly impressive, the unlearning of specific concepts remains a very nascent area of exploration. While fine-tuning methods are certainly effective for incorporating new concepts, can they be used to unlearn specific knowledge?
In one of the most fascinating papers of this year, Microsoft Research explores an unlearning technique for LLMs. The challenge was nothing less than making Llama-7B to forget any knowledge of Harry Potter.