aPaperADay
37 Attention in Natural Language Processing
This paper give a great overview of the taxonomy of attention.
2021-07-22
36 SimCSE, Simple Contrastive Learning of Sentence Embeddings
The SimCSE framework is an extremely simple technique that uses dropout as a data augmentation technique.
2021-07-20
35 AN IMAGE IS WORTH 16X16 WORDS
Vision Transformers (ViT) as simple extensions of Transformers for image data.
2021-07-15
34 Combiner- Full Attention Transformer with Sparse Computation Cost
This paper proposes a way reduct attention cost from O(N^2) to O(N Log(N))
2021-07-14
33 MOCOv3 An Empirical Study of Training Self-Supervised Vision Transformers
The MOCOv3 is an incremental improvement on the MOmentum COntrast framework
2021-07-13
32 DINO Emerging Properties in Self-Supervised Vision Transformers
This paper looks at the incredible power (and compute) ViT can use with fully self-supervised tasks.
2021-07-12
31 BertGeneration
This paper runs thousands of experiment hours on TPUs to determine best ways to combine publicly available checkpoints for text generation tasks.
2021-07-10
30 BART
Similar to BERT but geared to generation like GPT. I'd suggest BgpRT
2021-07-09
29 ALBERT A LITE BERT FOR SELF-SUPERVISED LEARNING OF LANGUAGE REPRESENTATIONS
This paper shows how to lighten up BERT when he's feeling weighed down by too many parameters.
2021-07-08
28 BEiT
Which stands for Bidirectional Encoder representation from Image Transformers
2021-07-07