Deep Learning
Normalisation: BatchNorm, LayerNorm, RMSNorm
Why normalisation accelerates training, why transformers use LayerNorm instead of BatchNorm, and why RMSNorm is now the default in Llama-class models.
intermediate · 7 min read · Premium
This concept is for Pro members.
Unlock the full library, study plans, the AI mentor, and daily emails.
See plans