News
5 days ago
Your PyTorch Model Is Slower Than You Think: This Is the Reason Why
We’ll cover three categories of hidden bottlenecks I measured on a real RTX 5060 training loop. None of them is in your model arch...
Sep 23, 2025
Building an H.264 Decoder with Nvidia CUDA
More than a decade after first experimenting with H.264 encoding, the author revisits the challenge of building a performant video...
Sep 13, 2025
Stop Waiting: Make XGBoost 46x Faster With One Parameter Change
XGBoost has built-in support for NVIDIA CUDA, so tapping into GPU acceleration doesn’t require new libraries or code rewrites. Thi...
