🦾Transformer Architectures Recap

Sep 16, 2021

As requested by many of our readers, before diving deeper into Self-Supervised Learning, we put together a recap of the Transformer Architectures series. As a proverb says: Repetition is the mother of learning ;) Let’s have some useful intro about the whole category first:

💡What are Transformers?

Transformer architectures are considered by many the most important development in the recent years of deep learning. These architectures were specialized in the processing of sequential datasets, which are relevant in domains such as computer vision or natural language processing (NLP). Before the inception of transformers, that space was dominated by recurrent neural network (RNNs) models, such as long-short term memory (LSTM) networks. Transformers challenged the conventional wisdom in RNN architectures by not relying on an ordered position of the input data. Instead, transformers relied on unique attention mechanisms that provided context for any position in the sequence. Transformers have been the cornerstone of such groundbreaking models as Google BERT and OpenAI GPT3, which set up new milestones in NLP scenarios. In recent months, transformers are also making important inroads in other areas such as computer vision and time-series analysis.

Forward this email to those who might benefit from reading it or give a gift subscription.

🤖 Edge#109: Transformer Architectures – the technique that made possible a few major breakthroughs in deep learning; Google’s Attention paper that started the transformer revolution; +Tensor2Tensor.

🤗 Edge#111: The concept of Attention; Google Switch Transformer – the biggest transformer model ever built; +Hugging Face.

🧠 Edge#112 is a deep dive about how DeepMind’s compressive transformer improves long-term memory in transformer architectures.

🍢 Edge#113: the architecture of Google BERT; TAPAS – a model that extends BERT’s architecture to work with tabular datasets; +AutoNLP.

👏 Edge#114 is a deep dive into AI2’s Longformer – a Transformer Model for Long (read it without a subscription).

🤩Edge#115: the concept of the most famous transformer ever built – GPT-3; two mechanisms for improving the current generation of transformer models by FAIR; +OpenAI API.

👁 Edge#117: Transformers and Computer Vision; ImageGPT – an adaptation of GPT model to computer vision scenarios; +Hugging Face library.

👯‍♀️ Edge#118 is a deep dive into DeepMind’s Perceiver and Perceiver IO.

🕐🕚 Edge#121: Transformers and Time Series; Google Research’s paper about temporal fusion transformers; +GluonTS.

🎭 Edge#122 is a deep dive into Unified VLP – a transformer model for visual question answering (VQA).

Next week we will continue with Self-Supervised Learning. Stay tuned!

TheSequence

Discussion about this post