← Concept library

Applied LLMs

InfiniBand and Inter-Node Networking

InfiniBand provides low-latency, high-bandwidth RDMA links between GPU nodes, and understanding its topology and collective communication patterns is essential for diagnosing and eliminating the network bottleneck in large-scale training.

intermediate · 8 min read · Premium

A single H100 GPU delivers roughly 3.35 TFLOPS of FP64 throughput but connects to the rest of the cluster through a wire rated at perhaps 200 Gbps. When you are synchronising gradients across thousands of such cards, the arithmetic units often idle, waiting for bytes to arrive. Inter-node networking is not a footnote to the accelerator story; for many workloads it is the binding constraint.

What InfiniBand actually is

InfiniBand (IB) is a switched-fabric interconnect standard defined by the InfiniBand Trade Association. Unlike Ethernet, which was designed for bursty packet traffic with loose latency guarantees, IB was engineered from the start for consistent sub-microsecond latency and deterministic throughput.

Keep reading with Pro.

You're reading the preview. Unlock the full concept plus the library, study plans, the AI mentor, and daily emails.

Sign in to save and react.
Share Copied