Hi everyone, I am experiencing unwanted behavior from CUDA and I was wondering if anyone has any idea about what is happening here.
Brief intro: I just installed Julia 1.7.2 on a new Windows machine. CUDA was working just fine. After updating the CUDA drivers (Release 510), I started having all sorts of issues.
Running
Pkg.test(“CUDA”)
resulted in a fatal error which forced Windows to reboot.
I then reverted back to the previous NVIDIA driver version (Release 470), but even though the CUDA tests now pass, I am experiencing several unwanted behaviors:
Thanks for the reply! After I upgraded to 510, any operation on CUDA would just cause the OS to crash and needed a reboot. I could not even extract any crash logs, since the machine would just reboot automatically. Any thoughts on why this would be the case? I also tried cleaning up the Julia environment entirely and re-installing everything, but I still got the same problem.
Regarding the second question, I am indeed using VSCode, I had no idea that would be the problem. I will try using a Powershell instead to double check.
I have some more info on this, hoping it helps.
The issues regarding computations are resolved by launching Julia using Powershell, instead of VSCode.
However, Driver 510 still causes the exact same behavior, with the OS crashing. The Stop Code I receive is: PAGE_FAULT_IN_NONPAGED_AREA
Failure seems to be happening with nvlddmkm.sys.
I know it might be hard (or even impossible) to figure out with this information, but I thought someone else might experience the same issues. Should I report this on the CUDA GitHub page?
Also, CUDA.jl works fine on Julia 1.6.4, both v3.8.0 and v3.9.0. I also performed a driver update, which moved to CUDA 11.6, and still got the same results (work properly on Julia 1.6.4 but crashes on Julia 1.7.2). Also, CUDA.jl v3.9.0 works fine for Julia 1.8.0-rc3.
This is unrelated, and probably a Julia bug. Try removing the cache of compiled packages (~/.julia/compiled). Unless you’re doing something peculiar – the Replacing module LLVM seems to indicate that – are you using Revise after updating LLVM.jl, or have you anything special in your startup.jl? In any way, this cannot be related to a driver upgrade.
Deleting the compiled packages file under: ( ~/.julia/compiled/v1.7) fixed it! Thanks for the trick!
I’m not aware of any particular usage of LLVM.jl or Revise.jl. The only thing I can think of is that I had made the initial install of CUDA within a Project/Pkg environment (not the general 1.7 one).