Precision Tool

Warp Divergence Visualizer

Enter a CUDA branch condition and visualize how a warp serializes active and waiting lanes across execution passes.

KERNEL CONDITIONAL

QUICK EXAMPLES

Valid expression

EXECUTION METRICS

Efficiency Ratio

33.3%

Active Threads: 11/32

Waiting Threads: 21/32

Passes: 3

Overhead: 66.7% serialization

Warp View

Warp Divergence

Auto-play passesPass 1 of 2

Pass 1 of 2

Speed:

PASS 1

Executing the if-branch. Non-matching threads are waiting.

11 threads active, 21 threads waiting

PASS 2

Executing the else-branch. Previously active threads become idle.

21 threads active, 11 threads waiting

DIVERGENCE EXPLAINER

WHAT HAPPENED

Only 11 of 32 threads take the if-branch. 21 threads sit idle during pass 1, then 11 sit idle during pass 2. Efficiency: 33.3%.

HOW TO FIX IT

Consider restructuring data so threads in the same warp process elements of the same type - eliminating the modulo branch. Warp divergence is unavoidable sometimes, but minimizing it is key to high GPU occupancy and throughput.

DETAILED METRICS

Efficiency Ratio: 33.3%

Active Threads: 11 / 32

Wasted Slots: 11

Execution Passes: 3

Serialization Overhead: 66.7%

Threads Taking If-Branch: 11

Threads Taking Else-Branch: 21

Branch Imbalance: 31.3%

Recommended Fix: Group similar data per warp to avoid modulo-driven divergence.

High divergence

Condition: threadIdx.x % 3 == 0