← Concept library

Inference Optimisation

FlashAttention

An IO-aware attention kernel that is both faster and lower-memory than the textbook implementation by tiling computation to keep activations in SRAM.

advanced · 9 min read · Premium

This concept is for Pro members.

Unlock the full library, study plans, the AI mentor, and daily emails.

See plans