CuArrays: error calling CuArray() (ERROR_INVALID_DEVICE)

question

#21

You didn’t execute all code. Do subsequent operations still work, and show a successful context construction? This really doesn’t make sense, are you sure there’s no other application using the GPU in the background (try shell> nvidia-smi from within the Julia session)?


#22

@maleadt: I guess with “all code” you mean e.g. what I exectued in post 4 above. Here you see the result of its execution:

>julia

shell> grep -3 "cuDriverGetVersion" .julia/packages/CUDAnative/AGfq2/src/init.jl
#
# feel free to open a PR adding additional API calls, if you have a specific use for them.
const preinit_apicalls = Set{Symbol}([
    :cuDriverGetVersion,
    # device calls, commonly used to determine the most appropriate device
    :cuDeviceGet,
    :cuDeviceGetAttribute,

shell> grep -3 "Creating a CuContext" .julia/packages/CUDAdrv/LC5XS/src/context.jl
    error("CuContext cannot be copied")

function CuContext(dev::CuDevice, flags::CUctx_flags=SCHED_AUTO)
    @info "Creating a CuContext" dev flags
    Base.show_backtrace(stderr, stacktrace())
    println(stderr)
    handle_ref = Ref{CuContext_t}()

julia> using CuArrays

julia> B = ones(10_000) ./ 2;

julia> A = ones(10_000) .* π;

julia> cuB = CuArray(B);
┌ Info: Creating a CuContext
│   dev = CuDevice(0): Tesla P100-PCIE-16GB
└   flags = SCHED_AUTO::CUctx_flags = 0

Stacktrace:
 [1] macro expansion at ./logging.jl:305 [inlined]
 [2] CUDAdrv.CuContext(::CUDAdrv.CuDevice, ::CUDAdrv.CUctx_flags) at /users/omlins/.julia/packages/CUDAdrv/LC5XS/src/context.jl:111 (repeats 2 times)
 [3] device!(::CUDAdrv.CuDevice) at /users/omlins/.julia/packages/CUDAnative/AGfq2/src/init.jl:69
 [4] initialize at /users/omlins/.julia/packages/CUDAnative/AGfq2/src/init.jl:40 [inlined] (repeats 2 times)
 [5] macro expansion at ./logging.jl:305 [inlined]
 [6] maybe_initialize(::Symbol) at /users/omlins/.julia/packages/CUDAnative/AGfq2/src/init.jl:32
 [7] macro expansion at /users/omlins/.julia/packages/CUDAdrv/LC5XS/src/base.jl:140 [inlined]
 [8] #alloc#3(::CUDAdrv.Mem.CUmem_attach, ::Function, ::Int64, ::Bool) at /users/omlins/.julia/packages/CUDAdrv/LC5XS/src/memory.jl:161
 [9] alloc at /users/omlins/.julia/packages/CUDAdrv/LC5XS/src/memory.jl:157 [inlined] (repeats 2 times)
 [10] macro expansion at /users/omlins/.julia/packages/CuArrays/f4Eke/src/memory.jl:237 [inlined]
 [11] macro expansion at ./util.jl:213 [inlined]
 [12] (::getfield(CuArrays, Symbol("##17#18")){Base.RefValue{CUDAdrv.Mem.Buffer}})() at /users/omlins/.julia/packages/CuArrays/f4Eke/src/memory.jl:236
 [13] lock(::getfield(CuArrays, Symbol("##17#18")){Base.RefValue{CUDAdrv.Mem.Buffer}}, ::ReentrantLock) at ./lock.jl:101
 [14] macro expansion at ./util.jl:213 [inlined]
 [15] alloc(::Int64) at /users/omlins/.julia/packages/CuArrays/f4Eke/src/memory.jl:221
 [16] CuArray{Float64,1}(::Tuple{Int64}) at /users/omlins/.julia/packages/CuArrays/f4Eke/src/array.jl:45
 [17] convert at /users/omlins/.julia/packages/CuArrays/f4Eke/src/array.jl:99 [inlined]
 [18] convert at /users/omlins/.julia/packages/CuArrays/f4Eke/src/array.jl:105 [inlined]
 [19] CuArray(::Array{Float64,1}) at /users/omlins/.julia/packages/GPUArrays/hzyWn/src/construction.jl:36
 [20] top-level scope at none:0
 [21] eval(::Module, ::Any) at ./boot.jl:319
 [22] eval_user_input(::Any, ::REPL.REPLBackend) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.0/REPL/src/REPL.jl:85
 [23] macro expansion at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.0/REPL/src/REPL.jl:117 [inlined]
 [24] (::getfield(REPL, Symbol("##28#29")){REPL.REPLBackend})() at ./task.jl:259
ERROR: CUDA error: invalid device ordinal (code #101, ERROR_INVALID_DEVICE)
Stacktrace:
 [1] macro expansion at ./logging.jl:315 [inlined]
 [2] CUDAdrv.CuContext(::CUDAdrv.CuDevice, ::CUDAdrv.CUctx_flags) at /users/omlins/.julia/packages/CUDAdrv/LC5XS/src/context.jl:111 (repeats 2 times)
 [3] device!(::CUDAdrv.CuDevice) at /users/omlins/.julia/packages/CUDAnative/AGfq2/src/init.jl:69
 [4] initialize at /users/omlins/.julia/packages/CUDAnative/AGfq2/src/init.jl:40 [inlined] (repeats 2 times)
 [5] macro expansion at ./logging.jl:305 [inlined]
 [6] maybe_initialize(::Symbol) at /users/omlins/.julia/packages/CUDAnative/AGfq2/src/init.jl:32
 [7] macro expansion at /users/omlins/.julia/packages/CUDAdrv/LC5XS/src/base.jl:140 [inlined]
 [8] #alloc#3(::CUDAdrv.Mem.CUmem_attach, ::Function, ::Int64, ::Bool) at /users/omlins/.julia/packages/CUDAdrv/LC5XS/src/memory.jl:161
 [9] alloc at /users/omlins/.julia/packages/CUDAdrv/LC5XS/src/memory.jl:157 [inlined] (repeats 2 times)
 [10] macro expansion at /users/omlins/.julia/packages/CuArrays/f4Eke/src/memory.jl:237 [inlined]
 [11] macro expansion at ./util.jl:213 [inlined]
 [12] (::getfield(CuArrays, Symbol("##17#18")){Base.RefValue{CUDAdrv.Mem.Buffer}})() at /users/omlins/.julia/packages/CuArrays/f4Eke/src/memory.jl:236
 [13] lock(::getfield(CuArrays, Symbol("##17#18")){Base.RefValue{CUDAdrv.Mem.Buffer}}, ::ReentrantLock) at ./lock.jl:101
 [14] macro expansion at ./util.jl:213 [inlined]
 [15] alloc(::Int64) at /users/omlins/.julia/packages/CuArrays/f4Eke/src/memory.jl:221
 [16] CuArray{Float64,1}(::Tuple{Int64}) at /users/omlins/.julia/packages/CuArrays/f4Eke/src/array.jl:45
 [17] convert at /users/omlins/.julia/packages/CuArrays/f4Eke/src/array.jl:99 [inlined]
 [18] convert at /users/omlins/.julia/packages/CuArrays/f4Eke/src/array.jl:105 [inlined]
 [19] CuArray(::Array{Float64,1}) at /users/omlins/.julia/packages/GPUArrays/hzyWn/src/construction.jl:36
 [20] top-level scope at none:0

julia> cuB = CuArray(B);

julia> cuA = CuArray(A);

julia> cuC = 2 .* cuA ./ cuB;

julia> all(cuC .≈ 4*π);

julia> all(cuC .≈ 4*π)
true

julia> typeof(cuC)
CuArray{Float64,1}

shell> nvidia-smi
Mon Jan  7 09:23:57 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.44                 Driver Version: 396.44                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla P100-PCIE...  On   | 00000000:02:00.0 Off |                    0 |
| N/A   28C    P0    30W / 250W |    299MiB / 16280MiB |      0%   E. Process |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0     17745      C   julia                                        289MiB |
+-----------------------------------------------------------------------------+

julia> 

You can see that there is only Julia running on the GPU. The compute node was allocated by SLURM with exlusive access and the above instructions are the only I executed. The behaviour when calling CuArray is as before besides the printing of the additional error information.