Streaming Multiprocessor
Core execution block where warps are scheduled and tensor pipelines execute.
Hardware / Architecture
The SM is the main architectural unit in modern GPUs. This page summarizes the key hardware blocks that matter most for AI and inference performance tuning.
Core execution block where warps are scheduled and tensor pipelines execute.
Specialized matrix units for BF16/FP16/FP8 workloads and high-throughput AI kernels.
Registers, shared memory, L2, and HBM/GDDR bandwidth shape achievable performance.
Warp Schedulers
4x
Dispatch Units
8x
CUDA Cores
128x
Tensor Cores
4x
Related Navigation