← Concept library

Inference Optimisation

Speculative Decoding

Use a small draft model to propose tokens that a large verifier accepts or rejects in parallel, giving lossless 2-3x latency wins on autoregressive generation.

advanced · 9 min read · Premium

This concept is for Pro members.

Unlock the full library, study plans, the AI mentor, and daily emails.

See plans