Back to GPU Hub

07 / Learning Path

Libraries & Frameworks

cuBLAS/cuDNN/Triton/PyTorch and profiling workflow.

Main Sections

Sub Topics

Topic 01

01

cuBLAS GEMM and Tensor Core Usage

Theory

cuBLAS is NVIDIA's high-performance BLAS library, and GEMM is the core operation behind many deep learning workloads.

For GEMM shapes and datatypes that meet hardware paths, cuBLAS can route work to Tensor Cores for major throughput gains.

Key points

C = alpha * A * B + beta * C
  • GEMM is the backbone of linear layers and attention projections.
  • Tensor Cores execute matrix-tile math more efficiently than standard scalar FP32 pipelines.
  • cuBLAS selects optimized kernels based on shape, datatype, and architecture.