Edge 260: Data2vec 2.0 is Meta AI's New Self-Supervised Learning Model for Vision, Speech and Text
The model is one of the most impressive achievements in self-supervised learning research to this day.
Earlier last year, Meta AI unveiled Data2vec, one of the first self-supervised learning models to ever master tasks across different domains such as speech, text and vision. The model was one of the first iterations in Meta AI’s self-supervised architectures that emulate human learning processes using different sensorial inputs. A few weeks ago, Meta AI followed up with Data2vec 2.0, a new version of the models that shows 16x performance improvement.
The original Data2vec architecture based on a student and a teacher network. The teacher network computes representations for a text, image, or speech. The student network takes that output and attempts to predict the latent representations back to the teacher. The two neural networks are nearly identical.