Applied LLMs
torch.compile and TorchInductor
torch.compile traces PyTorch graphs at runtime via TorchDynamo, then lowers them through TorchInductor to fused Triton or C++ kernels, delivering 20-36% throughput gains with no model rewrites.
intermediate · 8 min read · Premium
PyTorch 2.0 shipped a single function that, on average, made 165 open-source models run 20% faster at float32 and 36% faster under AMP - with no changes to the model code. That function is torch.compile. Understanding why it works requires following the graph from Python bytecode down to machine code.
The compilation pipeline in four layers
torch.compile is not one tool but a stack of four cooperating components:
Keep reading with Pro.
You're reading the preview. Unlock the full concept plus the library, study plans, the AI mentor, and daily emails.