Training Infrastructure
ZeRO and FSDP
How sharding optimiser state, gradients, and parameters across data-parallel ranks turns a memory problem into a bandwidth problem, and why FSDP is now the PyTorch default.
advanced · 9 min read · Premium
This concept is for Pro members.
Unlock the full library, study plans, the AI mentor, and daily emails.
See plans