Back to GPU Hub
07 / Learning Path
Libraries & Frameworks
cuBLAS/cuDNN/Triton/PyTorch and profiling workflow.
Main Sections
Sub Topics
Topic 01
01
cuBLAS GEMM and Tensor Core Usage
Theory
cuBLAS is NVIDIA's high-performance BLAS library, and GEMM is the core operation behind many deep learning workloads.
For GEMM shapes and datatypes that meet hardware paths, cuBLAS can route work to Tensor Cores for major throughput gains.
Key points
C = alpha * A * B + beta * C
- GEMM is the backbone of linear layers and attention projections.
- Tensor Cores execute matrix-tile math more efficiently than standard scalar FP32 pipelines.
- cuBLAS selects optimized kernels based on shape, datatype, and architecture.