Inference Optimisation
Speculative Decoding
Use a small draft model to propose tokens that a large verifier accepts or rejects in parallel, giving lossless 2-3x latency wins on autoregressive generation.
advanced · 9 min read · Premium
This concept is for Pro members.
Unlock the full library, study plans, the AI mentor, and daily emails.
See plans