Back to GPU

Precision Tool

Warp Divergence Visualizer

Enter a CUDA branch condition and visualize how a warp serializes active and waiting lanes across execution passes.

KERNEL CONDITIONAL

QUICK EXAMPLES

Valid expression

EXECUTION METRICS

Efficiency Ratio

33.3%

Active Threads: 11/32

Waiting Threads: 21/32

Passes: 3

Overhead: 66.7% serialization

Warp View

Warp Divergence

Pass 1 of 2
00
01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Pass 1 of 2
Speed:

PASS 1

Executing the if-branch. Non-matching threads are waiting.

11 threads active, 21 threads waiting

PASS 2

Executing the else-branch. Previously active threads become idle.

21 threads active, 11 threads waiting

DIVERGENCE EXPLAINER

WHAT HAPPENED

Only 11 of 32 threads take the if-branch. 21 threads sit idle during pass 1, then 11 sit idle during pass 2. Efficiency: 33.3%.

HOW TO FIX IT

Consider restructuring data so threads in the same warp process elements of the same type - eliminating the modulo branch. Warp divergence is unavoidable sometimes, but minimizing it is key to high GPU occupancy and throughput.

DETAILED METRICS

Efficiency Ratio: 33.3%

Active Threads: 11 / 32

Wasted Slots: 11

Execution Passes: 3

Serialization Overhead: 66.7%

Threads Taking If-Branch: 11

Threads Taking Else-Branch: 21

Branch Imbalance: 31.3%

Recommended Fix: Group similar data per warp to avoid modulo-driven divergence.

High divergence

Condition: threadIdx.x % 3 == 0