04 / Learning Path

Compilation Pipeline

From CUDA source to PTX/SASS and architecture execution.

Main Sections

Sub Topics

Topic 01

CUDA C++ -> PTX -> SASS -> Binary Pipeline

Theory

This is the complete path from source code to GPU execution, with four major stages.

You write CUDA C++ kernels in .cu files, nvcc emits PTX, ptxas produces architecture-specific SASS, and final binaries are packaged for runtime loading.

Four stages

Stage 1 - CUDA C++: kernel code with CUDA keywords like __global__, __shared__, threadIdx, blockIdx.
Stage 2 - PTX: virtual ISA (readable assembly-like intermediate representation).
Stage 3 - SASS: real machine instructions for a specific SM target (for example sm_80 or sm_90).
Stage 4 - cubin/fatbin: packaged binaries, with fatbin holding multiple targets.

your_code.cu -> nvcc -> PTX -> ptxas -> SASS -> cubin/fatbin -> GPU executes