Help with performance issue when upgrading CUDA

jacknthebox74 · November 23, 2025, 9:30pm

Hi all. I have an issue that I’m hoping somebody on here can help me with. The issue is simple to state yet hard (at least for me) to diagnose, and it’s this: every time I try to upgrade to a version of CUDA past 5.5.2, the performance of my code is adversely affected. For some situations the performance difference is around 40%, but sometimes the performance difference is a massive 5x or greater! It was shocking to see the first time it happened, and still is, to tell the truth, and I have no idea what’s causing it or how to fix it. So, I’m essentially stuck at version 5.5.2, because I can’t accept such a huge performance decrease after an upgrade.

A little about me. I am NOT a professional software developer, and I do not have a degree. I’m basically a hobbyist, and so I am essentially self-taught when it comes to what I’m working on, which at the moment is Machine Learning, as well as the tools that I choose to use, i.e., Julia and CUDA. I am able to write basic GPU kernels using CUDA.jl, using either 1d, 2d, or 3d thread blocks, and most of the time they work as designed and generally result in speedups of around 10x to 100x over the CPU code. The code in question, in fact, is my custom implementation of the “standard” ML algorithm, which consists of a feed forward portion and a back propagation portion, and I’ve written several GPU kernels to handle the processing of all the derivatives (or gradients, if you prefer) needed for the BP portion of the algorithm. (Yes, I’m aware of the existence of automatic differentiation systems.) An example of what my modest code can do: it can complete the entire ML process for an FFN for the entire MNIST dataset consisting of 70000 28 x 28 black and white images, in about 70 ms. This is for the “high-performance” setting (I’m referring to the hyperparameters, especially the hidden layer sizes and the batch size). The “high accuracy” setting takes about 450 ms. But that’s using CUDA 5.5.2. Using any later version of CUDA takes significantly longer, and the most recent version I tested, 5.9.4, took about 100ms and 625 ms, respectively.

What’s worse, after doing data-augmentation to produce a dataset consisting of 970000 images (done on the CPU prior to training), 5.5.2 takes about 2.5 minutes to complete 20 epochs of training, while version 5.9.4 takes a whopping 11 minutes to complete the same training! What’s going on here?! Something is seriously not right. How is it possible that there is such a huge difference in performance, especially when we’re dealing with a minor release change and not a major release change?

Please note, I am in no way criticizing the developers of CUDA.jl. I think it’s amazing what they’ve been able to accomplish, and I commend them for all the work that went into making GPU programming on Julia possible for someone like me. But I just really don’t understand how it’s possible to have such a huge performance difference for a minor release. But I would think that this kind of performance difference would be a head-scratcher even for a major release. I don’t know. Maybe it’s my system; maybe I just have a long-in-the-tooth video card and need to upgrade. In any event, I’d appreciate any feedback the community could provide. Thanks a bunch.

jacknthebox74 · November 23, 2025, 9:37pm

Fyi: I do my programming on a GTX 1660ti w/ Max-Q laptop.

ufechner7 · November 24, 2025, 12:54am

Can you provide a minimal working example (MWE) that shows the slowdown?

Without a piece of code, it is not possible to debug this issue.

Topic		Replies	Views
Help with 0.5 slowdown General Usage question	25	2368	December 7, 2016
CUDA not working after today's update GPU	2	1825	April 16, 2021
After update, CUDA.jl not working General Usage	0	328	April 15, 2021
CUDA Speed drop Performance performance , cuda	4	479	August 23, 2020
CUDA v2 - performance regression on matrix multiplication GPU	14	1774	November 10, 2020

Help with performance issue when upgrading CUDA

Related topics