I have trained a Resnet18 using Flux on julia-1.10 rc1, actually I have trained multiple models with no trouble at all, but this week when I try to predict using my model I get a compilation error for my private package:
julia> using Skraak
β Error: cuDNN is not available for your platform (x86_64-linux-gnu-libgfortran5-cxx11-libstdcxx30-cuda+none-julia_version+1.10.0)
β @ cuDNN ~/.julia/packages/cuDNN/E0AFc/src/cuDNN.jl:172
prediction fails with a stack trace. cuDNN again.
This would be no problem, I reinstalled julia 1.9.4, cuDNN is available, but my jld2 does not load because it has a different structure.
(I thought jld2 was the safe way to save a model, alas no. I am using Metalhead in there and maybe that is where the structure change snuck in, not sure.)
So now I am stuck, canβt use a model that took 5 days on a GPU to train.
Question: when will cuDNN become available for my platform (x86_64-linux-gnu-libgfortran5-cxx11-libstdcxx30-cuda+none-julia_version+1.10.0), or is there another way to solve this problem.
Since I manged to train and run several models before this error surfaced, it is most likely due to a change in a Project.toml somewhere upstream. It can likely be fixed very easily by a package maintainer somewhere.
I really want to publish my package sometime, so others can use it, but a stable api is very important.
I am having a similar problem for julia 1.9.4 while trying to install a older version of CUDA 4.1-4.4 (the current version runs out of GPU memory, while it worked with older version).
I wondering if the issue is not from the none in the platform name. Maybe the CUDA version is not recognized?
I also see this error:
NNlibCUDACUDNNExt [ab3ce674-22af-5de9-b6c7-795b17302dcb]
β β Error: CUDA.jl could not find an appropriate CUDA runtime to use.
β β
β β This can have several reasons:
β β * you are using an unsupported platform: this version of CUDA.jl
β β only supports Linux (x86_64, aarch64, ppc64le) and Windows (x86_64),
β β while your platform was identified as x86_64-linux-gnu-libgfortran5-cxx11-libstdcxx30-cuda+local-julia_version+1.9.4;
β β * you precompiled CUDA.jl in an environment where the CUDA driver
β β was not available (i.e., a container, or an HPC login node).
β β in that case, you need to specify which CUDA version to use
β β by calling `CUDA.set_runtime_version!`;
β β * you requested use of a local CUDA toolkit, but not all
β β required components were discovered. try running with
β β JULIA_DEBUG=all in your environment for more details.
β β
β β For more details, refer to the CUDA.jl documentation at
β β https://cuda.juliagpu.org/stable/installation/overview/
β β @ CUDA ~/src/DiffModel-test4.jl/depot/packages/CUDA/p5OVK/src/initialization.jl:
Running this command did not resolve the problem for me:
julia> CUDA.set_runtime_version!()
[ Info: Reset CUDA Runtime version preference, please re-start Julia for this to take effect.
The best place to troubleshoot issues like these is the CUDA.jl issue tracker. I know there are a million and a half edge cases around discovering the right libraries, so the recommendation from the maintainers has been to open issues with enough information to repro the problem or otherwise deduce what might be happening on your machines.
(The ML category is probably not the best place since we donβt know whatβs going on either, and the GPU library maintainers arenβt actively watching here)
Hello, I was wondering whether you have got the problem again? It seems I am now having exactly the same problem: cuDNN is not available for your platform (x86_64-linux-gnu-libgfortran5-cxx11-libstdcxx30-cuda+none-julia_version+1.10.0), but CUDA and cuDNN both have been installed fine.
Yes, I still have this problem. It went away with 1.10 rc2, but when i upgraded to 1.10 it came back again. I have not checked for around a month though.
Extremely frustrating as I am unable to get any real work done.
Yeah, I am having exactly the same error. I have rolled back 1.94 and it works fine. I have added an issue to CUDA.jlβs GitHub and hope it can be resolved.
Thanks a million. It worked for me. I have successfully fine tuned my model on GPU.
Now I can work on my prediction loop again, at last. I need to predict over TB of audio, not having a GPU is a fate worse than death. I had to go back to a python library.