Training Infrastructure
Gradient Checkpointing, Activation Recomputation, and CPU Offload
Why activations - not weights - usually dominate training memory, and how recomputation and CPU/NVMe offload trade compute and bandwidth to fit larger models.
intermediate · 9 min read · Premium
This concept is for Pro members.
Unlock the full library, study plans, the AI mentor, and daily emails.
See plans