cuDNN, julia-1.10 and linux

I have trained a Resnet18 using Flux on julia-1.10 rc1, actually I have trained multiple models with no trouble at all, but this week when I try to predict using my model I get a compilation error for my private package:

julia> using Skraak
β”Œ Error: cuDNN is not available for your platform (x86_64-linux-gnu-libgfortran5-cxx11-libstdcxx30-cuda+none-julia_version+1.10.0)
β”” @ cuDNN ~/.julia/packages/cuDNN/E0AFc/src/cuDNN.jl:172

prediction fails with a stack trace. cuDNN again.

This would be no problem, I reinstalled julia 1.9.4, cuDNN is available, but my jld2 does not load because it has a different structure.

(I thought jld2 was the safe way to save a model, alas no. I am using Metalhead in there and maybe that is where the structure change snuck in, not sure.)

So now I am stuck, can’t use a model that took 5 days on a GPU to train.

Question: when will cuDNN become available for my platform (x86_64-linux-gnu-libgfortran5-cxx11-libstdcxx30-cuda+none-julia_version+1.10.0), or is there another way to solve this problem.

Since I manged to train and run several models before this error surfaced, it is most likely due to a change in a Project.toml somewhere upstream. It can likely be fixed very easily by a package maintainer somewhere.

I really want to publish my package sometime, so others can use it, but a stable api is very important.

Regards
David

1 Like

I am having a similar problem for julia 1.9.4 while trying to install a older version of CUDA 4.1-4.4 (the current version runs out of GPU memory, while it worked with older version).

I wondering if the issue is not from the none in the platform name. Maybe the CUDA version is not recognized?

I also see this error:

 NNlibCUDACUDNNExt [ab3ce674-22af-5de9-b6c7-795b17302dcb]
β”‚  β”Œ Error: CUDA.jl could not find an appropriate CUDA runtime to use.
β”‚  β”‚ 
β”‚  β”‚ This can have several reasons:
β”‚  β”‚ * you are using an unsupported platform: this version of CUDA.jl
β”‚  β”‚   only supports Linux (x86_64, aarch64, ppc64le) and Windows (x86_64),
β”‚  β”‚   while your platform was identified as x86_64-linux-gnu-libgfortran5-cxx11-libstdcxx30-cuda+local-julia_version+1.9.4;
β”‚  β”‚ * you precompiled CUDA.jl in an environment where the CUDA driver
β”‚  β”‚   was not available (i.e., a container, or an HPC login node).
β”‚  β”‚   in that case, you need to specify which CUDA version to use
β”‚  β”‚   by calling `CUDA.set_runtime_version!`;
β”‚  β”‚ * you requested use of a local CUDA toolkit, but not all
β”‚  β”‚   required components were discovered. try running with
β”‚  β”‚   JULIA_DEBUG=all in your environment for more details.
β”‚  β”‚ 
β”‚  β”‚ For more details, refer to the CUDA.jl documentation at
β”‚  β”‚ https://cuda.juliagpu.org/stable/installation/overview/
β”‚  β”” @ CUDA ~/src/DiffModel-test4.jl/depot/packages/CUDA/p5OVK/src/initialization.jl:

Running this command did not resolve the problem for me:

julia> CUDA.set_runtime_version!()
[ Info: Reset CUDA Runtime version preference, please re-start Julia for this to take effect.

The best place to troubleshoot issues like these is the CUDA.jl issue tracker. I know there are a million and a half edge cases around discovering the right libraries, so the recommendation from the maintainers has been to open issues with enough information to repro the problem or otherwise deduce what might be happening on your machines.

(The ML category is probably not the best place since we don’t know what’s going on either, and the GPU library maintainers aren’t actively watching here)

Thanks for your replies.

I have other stuff going on at the moment, as soon as I get time I will try a few more things and share the results somewhere more appropriate.

Julia is a really great language to work with and I am always learning new stuff.

Regards
David

I upgraded to julia 1.10 rc2, had a clean out and now it all works well again. cuDNN is back.

Kindness
David

Hello, I was wondering whether you have got the problem again? It seems I am now having exactly the same problem: cuDNN is not available for your platform (x86_64-linux-gnu-libgfortran5-cxx11-libstdcxx30-cuda+none-julia_version+1.10.0), but CUDA and cuDNN both have been installed fine.

Hi

Yes, I still have this problem. It went away with 1.10 rc2, but when i upgraded to 1.10 it came back again. I have not checked for around a month though.

Extremely frustrating as I am unable to get any real work done.

Regards
David

Yeah, I am having exactly the same error. I have rolled back 1.94 and it works fine. I have added an issue to CUDA.jl’s GitHub and hope it can be resolved.

Lei

Good news! Thanks to Tim Besard, there is an easy solution: check here for details: cuDNN not available for your platform Β· Issue #2252 Β· JuliaGPU/CUDA.jl Β· GitHub

TL;DR

First, remove the compilecache

rm -rf ~/.julia/compiled/v1.10/cuDNN
rm -rf ~/.julia/compiled/v1.10/CUDNN_jll

Then, restart again, i.e. add CUDA, cuDNN,
Then everything should work fine again.

2 Likes

Thanks!

I really don/t want to go back to 1.9

Kindness
David

Thanks a million. It worked for me. I have successfully fine tuned my model on GPU.

Now I can work on my prediction loop again, at last. I need to predict over TB of audio, not having a GPU is a fate worse than death. I had to go back to a python library.

Kindness
David