← Concept library

Applied LLMs

Tensor Cores and Matrix Engines

Tensor Cores are specialised matrix-multiply-accumulate units on modern GPUs that deliver peak FLOP/s only when operand shapes and numeric formats are chosen correctly.

intermediate · 8 min read · Premium

A single Volta SM shipped in 2017 could execute 8 TFLOP/s in FP32. The same SM with Tensor Cores enabled jumped to 125 TFLOP/s in FP16 with FP32 accumulation. That is a 15x headline ratio from a single microarchitectural addition. The catch: that number is only reachable if you feed the hardware exactly what it was built to consume.

What a Tensor Core actually does

A Tensor Core is a fixed-function unit that performs one operation per clock:

Keep reading with Pro.

You're reading the preview. Unlock the full concept plus the library, study plans, the AI mentor, and daily emails.

Sign in to save and react.
Share Copied