Base packages for all users

Julia’s CUDA.jl package downloads its own CUDA toolkit. The only thing that needs to be present to run CUDA code is the GPU hardware and respective drivers (those that need to be installed via sudo).

-erik

I expect that others have pointed this out already, but it’s worth repeating: To make using Julia package (and Python packages, and Docker images, and etc.) a good experience for users, there needs to be a file system where they can store a large (100 GByte?) amount of data, and which supports handling many small files.

This file system does not need to be backed up. Neither the typical home directory nor a parallel file system (e.g. GPFS) are good choices for this.

-erik

1 Like

@RonRahaman, May I provide an alternative solution? You can create a docker image which already contains the included packages, and tell users to always use the “latest” tag, so that would allow you to update and test the installation of julia version and packages without affecting the users.

My Dockerfile looks like this:

Dockerfile
FROM julia:1.7.1
ENV LD_LIBRARY_PATH=$JULIA_PATH/lib:home/libs
WORKDIR /home
RUN julia -e "import Pkg; Pkg.add(\"Plots\")"
RUN julia -e "import Pkg; Pkg.add(\"AbstractFFTs\")"
RUN julia -e "import Pkg; Pkg.add(\"SpecialFunctions\")"

Yeah, I actually discovered that the system CUDA drivers (10.5) caused issues with CUDA.jl, wheras the drivers installed by Julia (11.5) work great. Seems like there’s no benefit to using the system CUDA drivers, so I might not even install CUDA.jl sitewide.

But it seems like it still makes sense to install MPI.jl sitewide so I can force Julia use our site’s MPI libraries. Our MPI installation has some configuration options for the scheduler and network and whatnot. And so far, using Julia MPI with our site MPI libs works great.

When Julia installs its own MPI libraries, is it good at inferring site configuration options?

You have to explicitly tell it has to use system MPI: Configuration · MPI.jl

1 Like

Yes, I did that, and everything’s working well.

One use case I’m interested in is building against the NVHPC SDK. My main reason is that the MPI libraries in NVHPC are CUDA-aware. I’m not sure if MPI.jl will preferentially download a CUDA-aware MPI build; and I’m pretty sure that if I use MPI from NVHPC, I should also use CUDA from NVHPC. So I think I’d like to able to build with settings resembling this:

module load nvhpc/22.1
export JULIA_MPI_BINARY="system"
export JULIA_MPI_PATH=$NVHPC_ROOT/comm_libs/mpi
export JULIA_CUDA_USE_BINARYBUILDER=false
export CUDA_ROOT=$NVHPC_ROOT/cuda
export JULIA_DEBUG=CUDA

Unfortunately, I’m getting some pretty catastrophic errors when I install CUDA.jl like this and try to run CUDA.version(). I get a huge backtrace with this repeated (sorry if I’m clipping this snippet wrong):

_jl_invoke at /storage/coda1/pace-admins/manual/src/julia-1.7.2/src/gf.c:2247 [inlined]
jl_apply_generic at /storage/coda1/pace-admins/manual/src/julia-1.7.2/src/gf.c:2429
macro expansion at /storage/home/hcodaman1/rrahaman6/.julia/packages/CUDA/Axzxe/lib/cudadrv/libcuda.jl:5 [inlined]
macro expansion at /storage/home/hcodaman1/rrahaman6/.julia/packages/CUDA/Axzxe/lib/cudadrv/error.jl:97 [inlined]
cuGetErrorString at /storage/home/hcodaman1/rrahaman6/.julia/packages/CUDA/Axzxe/lib/utils/call.jl:26 [inlined]
description at /storage/home/hcodaman1/rrahaman6/.julia/packages/CUDA/Axzxe/lib/cudadrv/error.jl:53 [inlined]
showerror at /storage/home/hcodaman1/rrahaman6/.julia/packages/CUDA/Axzxe/lib/cudadrv/error.jl:60
#showerror#813 at ./errorshow.jl:88
showerror##kw at ./errorshow.jl:87 [inlined]
showvalue at /storage/coda1/pace-admins/manual/src/julia-1.7.2/usr/share/julia/stdlib/v1.7/Logging/src/ConsoleLogger.jl:56
unknown function (ip: 0x7fff9617d518)
_jl_invoke at /storage/coda1/pace-admins/manual/src/julia-1.7.2/src/gf.c:2247 [inlined]
jl_apply_generic at /storage/coda1/pace-admins/manual/src/julia-1.7.2/src/gf.c:2429
#handle_message#3 at /storage/coda1/pace-admins/manual/src/julia-1.7.2/usr/share/julia/stdlib/v1.7/Logging/src/ConsoleLogger.jl:134
handle_message##kw at /storage/coda1/pace-admins/manual/src/julia-1.7.2/usr/share/julia/stdlib/v1.7/Logging/src/ConsoleLogger.jl:109
unknown function (ip: 0x7fff96176c54)