← Concept library

Training Infrastructure

Gradient Checkpointing, Activation Recomputation, and CPU Offload

Why activations - not weights - usually dominate training memory, and how recomputation and CPU/NVMe offload trade compute and bandwidth to fit larger models.

intermediate · 9 min read · Premium

This concept is for Pro members.

Unlock the full library, study plans, the AI mentor, and daily emails.

See plans