Understanding TRANSFORMERS — Leaving RNN’s behind
This blog is about useful resources to understand transformers.
I have attached the links in the order that I have read them:
- https://rubikscode.net/2019/07/29/introduction-to-transformers-architecture/ This one first starts with giving an overview about RNN’s, attention, self-attention and then discusses the basic concepts of transformers.
- http://jalammar.github.io/illustrated-transformer/ This is the most important blog post on transformers. It consists of an in-depth discussion on Transformers
- https://glassboxmedicine.com/2019/09/07/universal-transformers/ This gives an overview of transformers first and the structure of the post is very simple and easy to understand. It then explains Universal Transformers with some animation.
- https://kazemnejad.com/blog/transformer_architecture_positional_encoding/ This one explains the positional encoding which is used in the early steps in the encoder and decoder part of the transformer( I just skimmed through it!).
I would recommend you guys to understand transformers and universal transformers together i.e in continuation.
After going through these posts, it is much easier to understand the paper on Transformers “Attention is all you need”[1] and the one on Universal Transformers “Universal transformers”[2].
References
[1] Vaswani, Ashish, et al. “Attention is all you need.” Advances in neural information processing systems. 2017.
[2] Dehghani, Mostafa, et al. “Universal transformers.” arXiv preprint arXiv:1807.03819 (2018).