Main GPU Hub / Hardware + Execution + Performance + Tools

For AI Developers

GPU HUB
START HERE

This is your main GPU page. From here you can jump into architecture, execution internals, performance analysis, and practical GPU tools. The UI is intentionally clean white while keeping the technical style.

Start Learning Open GPU Tools

Learning Tracks

7 sections

Interactive Tools

15+ tools

Supported GPUs

50+ GPUs

GPU LEARNING PATH

Physical Hardware

Physical hardware foundations: SMs, cores, memory buses, and board design.

Explore

Memory Hierarchy

Registers, shared memory, L2, VRAM, and how data moves across tiers.

Explore

Execution Model

Warps, blocks, grids, scheduling behavior, and runtime execution flow.

Explore

Compilation Pipeline

From CUDA source to PTX, SASS, and final kernel binaries.

Explore

CUDA Programming

Kernel design, memory access patterns, synchronization, and tuning.

Explore

Driver Stack

CUDA driver/runtime interactions, scheduling layers, and device control.

Explore

Libraries & Frameworks

CUDA libraries and ML frameworks that map workloads onto GPU kernels.

Explore

PRECISION TOOLS

Tool

VRAM Calculator

Estimate memory footprint by parameters, precision, sequence length, and mode.

Open VRAM Tool

Tool

Kernel Occupancy Estimator

Model active warps and identify register/shared-memory pressure quickly.

Open Occupancy Tool

Tool

GPU Picker

Shortlist GPU options for training, fine-tune, and high-throughput inference.

Open GPU Picker

Tool

Warp Divergence Visualizer

See how branch conditions split a warp into active and waiting execution passes.

Open Divergence Tool

Tool

Roofline Model Analyzer

Plot kernels against memory and compute ceilings to see the real bottleneck fast.

Open Roofline Tool

HARDWARE INDEX

GPU	Arch	VRAM	FP16 TFLOP	Mem BW	Compute	Best For
NVIDIA H100 SXM5	Hopper	80GB HBM3	1,979	3.3 TB/s	9.0	Large-scale training
NVIDIA A100 SXM4	Ampere	80GB HBM2e	312	2.0 TB/s	8.0	General DL workloads
RTX 4090	Ada Lovelace	24GB GDDR6X	82.6	1.0 TB/s	8.9	Local inference
L40S	Ada Lovelace	48GB GDDR6	183	0.8 TB/s	8.9	Multi-model serving

NVIDIA H100 SXM5

Large-scale training

Arch

Hopper

VRAM

80GB HBM3

FP16

1,979

Mem BW

3.3 TB/s

Compute

9.0

Best For

Large-scale training

NVIDIA A100 SXM4

General DL workloads

Arch

Ampere

VRAM

80GB HBM2e

FP16

312

Mem BW

2.0 TB/s

Compute

8.0

Best For

General DL workloads

RTX 4090

Local inference

Arch

Ada Lovelace

VRAM

24GB GDDR6X

FP16

82.6

Mem BW

1.0 TB/s

Compute

8.9

Best For

Local inference

L40S

Multi-model serving

Arch

Ada Lovelace

VRAM

48GB GDDR6

FP16

183

Mem BW

0.8 TB/s

Compute

8.9

Best For

Multi-model serving

Hardware Execution Performance Tools

Start here if you are new

Follow Physical Hardware, Memory Hierarchy, then Execution Model before jumping into tuning tools. That sequence makes the later pages much easier to interpret.

Start here if you already deploy models

Open the VRAM calculator and GPU picker first, then use the learning pages only where you need more explanation about bottlenecks or architecture tradeoffs.

Related decision guides

Pair this hub with the GPU and budget framework and low-VRAM model planning for end-to-end hardware decisions.

GPU HUBSTART HERE

GPU LEARNING PATH

Physical Hardware

Memory Hierarchy

Execution Model

Compilation Pipeline

CUDA Programming

Driver Stack

Libraries & Frameworks

PRECISION TOOLS

VRAM Calculator

Kernel Occupancy Estimator

GPU Picker

Warp Divergence Visualizer

Roofline Model Analyzer

HARDWARE INDEX

NVIDIA H100 SXM5

NVIDIA A100 SXM4

RTX 4090

L40S

Start here if you are new

Start here if you already deploy models

Related decision guides

GPU HUB
START HERE