Precompiling CUDAnative on login node (works on compute node)

This is a follow up of this post about building CUDAnative on login node. Building CUDAnative on the login nodes works with the master version. Now, unfortunately, comes up the next problem: how can we precompile CUDAnative on the login node?
Similarly as in this post about building CUDAnative on login node, precompiling CUDAnative immediately aborts as CUDAdrv cannot be initialized. Note that CUDAdrv aborts with an error when calling using CUDAdrv, but this seems to occur completely after the precompilation has completed.

julia> using CUDAdrv
[ Info: Precompiling CUDAdrv [c5f51814-7f29-56b8-a69c-e4d8f6be1fde]
ERROR: InitError: CUDA error: unknown error (code #999, ERROR_UNKNOWN)
Stacktrace:
 [1] macro expansion at /scratch/snx3000/omlins/julia_cuda_test_install/packages/CUDAdrv/ADRHQ/src/base.jl:145 [inlined]
 [2] init(::Int64) at /scratch/snx3000/omlins/julia_cuda_test_install/packages/CUDAdrv/ADRHQ/src/init.jl:10 (repeats 2 times)
 [3] __init__() at /scratch/snx3000/omlins/julia_cuda_test_install/packages/CUDAdrv/ADRHQ/src/CUDAdrv.jl:83
 [4] _include_from_serialized(::String, ::Array{Any,1}) at ./loading.jl:633
 [5] _require_from_serialized(::String) at ./loading.jl:684
 [6] _require(::Base.PkgId) at ./loading.jl:967
 [7] require(::Base.PkgId) at ./loading.jl:858
 [8] require(::Module, ::Symbol) at ./loading.jl:853
during initialization of module CUDAdrv

julia> using CUDAnative
[ Info: Precompiling CUDAnative [be33ccc6-a3ff-5ff2-a52e-74243cff1e17]
ERROR: LoadError: InitError: CUDA error: unknown error (code #999, ERROR_UNKNOWN)
Stacktrace:
 [1] macro expansion at /scratch/snx3000/omlins/julia_cuda_test_install/packages/CUDAdrv/ADRHQ/src/base.jl:145 [inlined]
 [2] init(::Int64) at /scratch/snx3000/omlins/julia_cuda_test_install/packages/CUDAdrv/ADRHQ/src/init.jl:10 (repeats 2 times)
 [3] __init__() at /scratch/snx3000/omlins/julia_cuda_test_install/packages/CUDAdrv/ADRHQ/src/CUDAdrv.jl:83
 [4] _include_from_serialized(::String, ::Array{Any,1}) at ./loading.jl:633
 [5] _require_search_from_serialized(::Base.PkgId, ::String) at ./loading.jl:713
 [6] _require(::Base.PkgId) at ./loading.jl:937
 [7] require(::Base.PkgId) at ./loading.jl:858
 [8] require(::Module, ::Symbol) at ./loading.jl:853
 [9] include at ./boot.jl:317 [inlined]
 [10] include_relative(::Module, ::String) at ./loading.jl:1044
 [11] include(::Module, ::String) at ./sysimg.jl:29
 [12] top-level scope at none:2
 [13] eval at ./boot.jl:319 [inlined]
 [14] eval(::Expr) at ./client.jl:393
 [15] top-level scope at ./none:3
during initialization of module CUDAdrv
in expression starting at /scratch/snx3000/omlins/julia_cuda_test_install/packages/CUDAnative/reYWn/src/CUDAnative.jl:4
ERROR: Failed to precompile CUDAnative [be33ccc6-a3ff-5ff2-a52e-74243cff1e17] to /scratch/snx3000/omlins/julia_cuda_test_install/compiled/v1.0/CUDAnative/4Zu2W.ji.
Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] compilecache(::Base.PkgId, ::String) at ./loading.jl:1203
 [3] _require(::Base.PkgId) at ./loading.jl:960
 [4] require(::Base.PkgId) at ./loading.jl:858
 [5] require(::Module, ::Symbol) at ./loading.jl:853

julia> 
omlins@daint104:/scratch/snx3000/omlins/julia_cuda_test_install> srun -Cgpu -pdebug --pty bash
srun: job 16302025 queued and waiting for resources
srun: job 16302025 has been allocated resources
omlins@nid03510:/scratch/snx3000/omlins/julia_cuda_test_install> module unload juliaomlins@nid03510:/scratch/snx3000/omlins/julia_cuda_test_install> module load julia/1.0.4 
omlins@nid03510:/scratch/snx3000/omlins/julia_cuda_test_install> export JULIA_DEPOT_PATH=.
omlins@nid03510:/scratch/snx3000/omlins/julia_cuda_test_install> julia
(v1.0.4-gpu) pkg> status
    Status `/scratch/snx3000/omlins/julia_cuda_test_install/environments/v1.0.4-gpu/Project.toml`
  [c5f51814] CUDAdrv v3.1.0 #master (https://github.com/JuliaGPU/CUDAdrv.jl.git)
  [be33ccc6] CUDAnative v2.2.1 #master (https://github.com/JuliaGPU/CUDAnative.jl.git)
  [3a865a2d] CuArrays v1.1.0 #master (https://github.com/JuliaGPU/CuArrays.jl.git)

julia> using CUDAdrv

julia> using CUDAnative
[ Info: Precompiling CUDAnative [be33ccc6-a3ff-5ff2-a52e-74243cff1e17]

Do you also have a solution to that @maleadt?

You can’t. Precompilation embeds system-specific information into the image (where CUDA is installed, type of hardware, supported compute capability, etc), so you can’t do that on the login node. Except if it’s guaranteed to be identical to the compute nodes, in which case you should be able to initialize CUDA (without throwing error 999).

I am also using the standard ML stack of Flux-CuArrays-CUDANative-CUDAdrv on a cluster, and this issue has been causing much pain. If it were possible to change the whole thing around such that the packages could build on a system without a GPU, and still work on a GPU, life would become much easier.

So basically https://github.com/JuliaGPU/CUDAnative.jl/pull/435?

Hopefully, yes. Thanks for pointing this out. I’ll evaluate it when I have time.

So can you, really?

Installing/building the package and precompiling it are two separate things. Since we need to efficiently deal with certain hardware/system-specific properties (e.g. CUDA version, compute model, etc), either one of those needs to be run on the final system that has a functional GPU. This used to be the Pkg.build phase, making it hard to install CUDA-related packages on e.g. the networked login node without a GPU. In the above PR, I’ve moved that logic to happen during precompilation, meaning you can now Pkg.add on the log-in node, but you will need to precompile on the compute node that has a GPU. If you know that your compute nodes are identical to the login node, you can precompile ahead of time and ship the ji files to the compute nodes, but this isn’t the case here since the login node isn’t able to initialize CUDA (which, again, we need to be able to in order to discover those system-specific properties).

Building was discussed in the previous topic that I mentioned in the topic description above. There, in my last post, you can see the shell output of a successful build of the master version of the CUDA packages on a login node without GPU and without CUDA driver.