Applied LLMs
Tensor Cores and Matrix Engines
Tensor Cores are specialised matrix-multiply-accumulate units on modern GPUs that deliver peak FLOP/s only when operand shapes and numeric formats are chosen correctly.
intermediate · 8 min read · Premium
A single Volta SM shipped in 2017 could execute 8 TFLOP/s in FP32. The same SM with Tensor Cores enabled jumped to 125 TFLOP/s in FP16 with FP32 accumulation. That is a 15x headline ratio from a single microarchitectural addition. The catch: that number is only reachable if you feed the hardware exactly what it was built to consume.
What a Tensor Core actually does
A Tensor Core is a fixed-function unit that performs one operation per clock:
Keep reading with Pro.
You're reading the preview. Unlock the full concept plus the library, study plans, the AI mentor, and daily emails.