Back to GPU Hub

02 / Learning Path

Memory Hierarchy

Data movement from registers to VRAM and bottlenecks.

Main Sections

Sub Topics

Topic 01

01

Register File - Fastest Storage and Spilling

Core concept

The register file is the fastest memory on the GPU. On modern NVIDIA data center parts, each SM exposes 65,536 32-bit registers and each thread gets its own private register allocation.

Registers are on-chip SRAM, so read/write latency is near one cycle. This is why keeping hot values in registers is critical for throughput.

  • Maximum architectural register index per thread is 255.
  • Compiler register allocation is automatic and depends on your kernel code.
  • If register demand exceeds what can be allocated efficiently, extra values spill to local memory (backed by VRAM).
  • Spills preserve correctness but can cause large performance drops due to high latency memory traffic.

Spec table

PropertyValue
Registers per SM65,536
Max registers per thread255
Access latency~1 cycle
Memory typeOn-chip SRAM

Practical content

Live calculator

Will my kernel spill?

Inputs

Threads per block, registers per thread

Output

Total register use vs 65,536 and spill risk state

Visual

Fill bar for register usage; overflow arrow to VRAM with SPILL warning