How to use this page
- 1. Start with the workload or kernel you actually care about, not a synthetic example.
- 2. Use the roofline output to identify whether memory or compute is the first bottleneck.
- 3. Only then decide whether to change kernel structure, precision, memory access, or hardware.
- 4. Re-run after each change so performance work stays evidence-based.