I apologize in advance for my question, because I know it is not very specific, but at this point I do not know where else to go for help.
I’ve been trying to compile some CUDA code using BinaryBuilder, but I haven’t been able to succeed. I really don’t think I need anything fancy to compile my code, but I do need nvcc
, and this is what I have been struggling with. Here is what I have tried so far:
Trying to load CUDA as a dependency
I tried copying some build_tarballs.jl
like NCCL’s build_tarballs.jl
, but I can’t even get that to run. When I try to NCCL’s script with julia build_tarballs.jl --verbose --debug
(only a locally cloned version of Yggdrasil) I get an error:
ERROR: LoadError: KeyError: key v"11.4.4" not found
where 11.4.4 corresponds to the CUDA version that is trying to be installed. The problem here seems to be that CUDA.required_dependencies
(from platforms/cuda.jl
) outputs dependencies that can’t be found. Here is a concrete example:
For the target x86_64-linux-gnu-cuda+11.4
, that is, platform = Platform("x86_64", "linux"; libc="glibc", cuda="11.4")
, CUDA.required_dependencies(platform)
returns the following two dependencies:
BuildDependency(PackageSpec(name="CUDA_SDK_jll", version=v"11.4.4"))
BuildDependency(PackageSpec(name="CUDA_Runtime_jll"))
But then when I run build_tarballs
with these dependencies, I get the keyerror 11.4.4 not found error. So then I figured I would put the exact version string that I found in the JuliaBinaryWrappers
repo on github. At this point, I have the following:
platforms = [Platform("x86_64", "linux"; libc="glibc", cuda="12.3")]
dependencies = [
HostBuildDependency(PackageSpec(; name="CMake_jll", version=v"3.28.1+0")), # We need cmake >= 3.18, but by default it is 3.17.2
BuildDependency(PackageSpec(name="CUDA_SDK_jll", version=v"12.3.2+0")),
RuntimeDependency(PackageSpec(name="CUDA_Runtime_jll")),
]
But now when I get dropped into the sandbox with these dependencies, nvcc
doesn’t work. I tried compiling a hello world, but I get the following error:
/opt/x86_64-linux-gnu/bin/../lib/gcc/x86_64-linux-gnu/4.8.5/../../../../x86_64-linux-gnu/bin/ld: cannot find -lcudadevrt
/opt/x86_64-linux-gnu/bin/../lib/gcc/x86_64-linux-gnu/4.8.5/../../../../x86_64-linux-gnu/bin/ld: cannot find -lcudart_static
This led me to an issue for NCCL’s build, but nothing I found there seemed to solve my problems. I also thought that since the error is talking about cudart_static
, that I might try and add CUDA_SDK_static_jll
as a dependency, but I still have the same error from ld
when running nvcc
. I also tried adding CUDA_full_jll
, but that didn’t help either. I feel like I am very close with this setup, but I cannot get cudadevrt
or cudart_static
to be found. (I of course did try to add $prefix/cuda/lib
to LD_LIBRARY_PATH
, but that did not do anything either).
Trying to manually install CUDA in the sandbox
Then I figured I might as well just install CUDA from the sandbox environment. I found this recipe which I started copying, but they have install scripts for CUDA 10 which I don’t think supports the compute architecture sm_80 which I need. I tried to use more up-to-date files:
sources = [
FileSource("https://us.download.nvidia.com/XFree86/Linux-x86_64/550.40.07/NVIDIA-Linux-x86_64-550.40.07.run", "298936c727b7eefed95bb87eb8d24cfeef1f35fecac864d98e2694d37749a4ad"),
FileSource("https://developer.download.nvidia.com/compute/cuda/12.3.2/local_installers/cuda_12.3.2_545.23.08_linux.run", "24b2afc9f770d8cf43d6fa7adc2ebfd47c4084db01bdda1ce3ce0a4d493ba65b"),
]
but I’ve honestly had a hard time trying to install CUDA this way within the sandbox. This is where I am stuck currently.
Is there a better way of doing this? Is there a certain set of dependencies that lets me have access to nvcc
, or do I have to build CUDA myself in the sandbox?
Thank you!