← Concept library

Training Infrastructure

ZeRO and FSDP

How sharding optimiser state, gradients, and parameters across data-parallel ranks turns a memory problem into a bandwidth problem, and why FSDP is now the PyTorch default.

advanced · 9 min read · Premium

This concept is for Pro members.

Unlock the full library, study plans, the AI mentor, and daily emails.

See plans