5 UMLFiT Analysis
Low Shot Learning
A key property of transfer learning is the idea of low-shot learning. This enables you to train similar or better models, with 1-2 orders of magnitude (50x - 100x) less data examples. That is amazing!
On ImDb and AG, the paper posts results that ULMFiT with only 100 labeled examples matches the performance of training from scratch with 10x and 20x more data, demonstrating the benefits of general domain LM pretraining
The impact of pretraining
In every instance, pretraining improves performance and data needs. Regardless of target dataset size.
Impact of LM quality
It matters, but a vanilla LM still provides benefits.
Impact of classifier fine-tuning
The impact of classifier fine-tuning scales about a factor of 2+ on IMDb and TREC-6 is less, but still significant. See table below.