Blog
10 hours ago
Why Measuring Time is Not Enough: a Practical Roofline Model for ML Training
When we want to speed up training, the first instinct is to measure time and start optimizing the slowest kernel. But raw measurements don’t tell the full story. We also need to understand the theoretical lower bound. How fast could this kernel be on the given hardware. The examples above are tools for decision-making: should we use TP or EP for this model, which kernel is damaging the end-to-end throughput the most, etc. So even when you’re sure that you can shave another 5% off a kernel in a week, don’t rush - somewhere else in the system, there might be something hiding that is running 20x slower than its theoretical optimum. Your optimization effort is finite.
Source: HackerNoon →