9 Mixtape - Softmax Bottleneck
Softmax bottlneck
As discussed yesterday, the softmax bottleneck is the problem that low dimension information can only support a limited amount of information transfer. In other words, the softmax is not able to predict as accurately as needed the complexities of language given its vectored form.
The proof is based in the inability of the softmax to contain the ground truth (P*) of the corollary below:
Where d is the rank of the embedding size of softmax function. It is provable that if the softmax embedding space is less than the rank of the ground truth, there exists the context where your output will not equal the ground truth, therefore it is not expressive enough due to being able to hold information in rank.