NVIDIA Driver Release 510 and Julia 1.7.2

Hi everyone, I am experiencing unwanted behavior from CUDA and I was wondering if anyone has any idea about what is happening here.
Brief intro: I just installed Julia 1.7.2 on a new Windows machine. CUDA was working just fine. After updating the CUDA drivers (Release 510), I started having all sorts of issues.
Running

Pkg.test(“CUDA”)

resulted in a fatal error which forced Windows to reboot.
I then reverted back to the previous NVIDIA driver version (Release 470), but even though the CUDA tests now pass, I am experiencing several unwanted behaviors:

julia> a = CUDA.fill(2.0f0,2^20)
1048576-element CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}:
0.0
2.0
0.0
2.0

2.0
0.0
2.0

And also:

julia> a = CUDA.rand(10)
10-element CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}:
0.0
0.0
0.0
0.0

0.0
0.0
0.0

Some specs:

julia> CUDA.version()
v"11.4.0"

julia> CUDA.versioninfo()
CUDA toolkit 11.6, artifact installation
NVIDIA driver 472.39.0, for CUDA 11.4
CUDA driver 11.4
Libraries:

  • CUBLAS: 11.8.1
  • CURAND: 10.2.9
  • CUFFT: 10.7.0
  • CUSOLVER: 11.3.2
  • CUSPARSE: 11.7.1
  • CUPTI: 16.0.0
  • NVML: 11.0.0+472.39
  • CUDNN: 8.30.2 (for CUDA 11.5.0)
  • CUTENSOR: 1.4.0 (for CUDA 11.5.0)
    Toolchain:
  • Julia: 1.7.2
  • LLVM: 12.0.1
  • PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0
  • Device capability support: sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, >sm_72, sm_75, sm_80
    2 devices:
    0: Quadro RTX 5000 (sm_75, 14.624 GiB / 15.000 GiB available)
    1: Quadro RTX 5000 (sm_75, 14.025 GiB / 15.000 GiB available)

Any ideas? Where am I going wrong?

I’m using driver 510 without any issues, so I don’t know what would be happening here.

Which REPL are you evaluating this in? There’s known issues with VSCode and CUDA.jl.

Thanks for the reply! After I upgraded to 510, any operation on CUDA would just cause the OS to crash and needed a reboot. I could not even extract any crash logs, since the machine would just reboot automatically. Any thoughts on why this would be the case? I also tried cleaning up the Julia environment entirely and re-installing everything, but I still got the same problem.

Regarding the second question, I am indeed using VSCode, I had no idea that would be the problem. I will try using a Powershell instead to double check.

Sorry, no. I haven’t encountered such an issue.

I have some more info on this, hoping it helps.
The issues regarding computations are resolved by launching Julia using Powershell, instead of VSCode.

However, Driver 510 still causes the exact same behavior, with the OS crashing. The Stop Code I receive is: PAGE_FAULT_IN_NONPAGED_AREA
Failure seems to be happening with nvlddmkm.sys.

I know it might be hard (or even impossible) to figure out with this information, but I thought someone else might experience the same issues. Should I report this on the CUDA GitHub page?

No. CUDA.jl shouldn’t be able to crash your PC, this is strictly a bug with the NVIDIA driver / your hardware.

Not sure if this can be related in any way, but I also got issues on WSL2, Julia 1.7.2 and latest CUDA.jl (v3.9.0):

I could install CUDA.jl v3.8.5 and it seems to work fine despite some warnings during installation:

Drivers info:

Also, CUDA.jl works fine on Julia 1.6.4, both v3.8.0 and v3.9.0. I also performed a driver update, which moved to CUDA 11.6, and still got the same results (work properly on Julia 1.6.4 but crashes on Julia 1.7.2). Also, CUDA.jl v3.9.0 works fine for Julia 1.8.0-rc3.

This is unrelated, and probably a Julia bug. Try removing the cache of compiled packages (~/.julia/compiled). Unless you’re doing something peculiar – the Replacing module LLVM seems to indicate that – are you using Revise after updating LLVM.jl, or have you anything special in your startup.jl? In any way, this cannot be related to a driver upgrade.

1 Like

Deleting the compiled packages file under: ( ~/.julia/compiled/v1.7) fixed it! Thanks for the trick!
I’m not aware of any particular usage of LLVM.jl or Revise.jl. The only thing I can think of is that I had made the initial install of CUDA within a Project/Pkg environment (not the general 1.7 one).