Precision Tool
Warp Divergence Visualizer
Enter a CUDA branch condition and visualize how a warp serializes active and waiting lanes across execution passes.
KERNEL CONDITIONAL
QUICK EXAMPLES
Valid expression
EXECUTION METRICS
Efficiency Ratio
33.3%
Active Threads: 11/32
Waiting Threads: 21/32
Passes: 3
Overhead: 66.7% serialization
Warp View
Warp Divergence
PASS 1
Executing the if-branch. Non-matching threads are waiting.
11 threads active, 21 threads waiting
PASS 2
Executing the else-branch. Previously active threads become idle.
21 threads active, 11 threads waiting
DIVERGENCE EXPLAINER
WHAT HAPPENED
Only 11 of 32 threads take the if-branch. 21 threads sit idle during pass 1, then 11 sit idle during pass 2. Efficiency: 33.3%.
HOW TO FIX IT
Consider restructuring data so threads in the same warp process elements of the same type - eliminating the modulo branch. Warp divergence is unavoidable sometimes, but minimizing it is key to high GPU occupancy and throughput.
DETAILED METRICS
Efficiency Ratio: 33.3%
Active Threads: 11 / 32
Wasted Slots: 11
Execution Passes: 3
Serialization Overhead: 66.7%
Threads Taking If-Branch: 11
Threads Taking Else-Branch: 21
Branch Imbalance: 31.3%
Recommended Fix: Group similar data per warp to avoid modulo-driven divergence.
High divergence
Condition: threadIdx.x % 3 == 0