CuArrays: error calling CuArray() (ERROR_INVALID_DEVICE)

Hi all,

I am new to Julia and would like to try out CuArrays. However, I get the following error when following the basic example from the Julia doc:

julia> using CuArrays

julia> 
julia> B = ones(10_000) ./ 2;

julia> A = ones(10_000) .* π;

julia> C = 2 .* A ./ B;

julia> all(C .≈ 4*π)
true

julia> cuB = CuArray(B);
ERROR: CUDA error: invalid device ordinal (code #101, ERROR_INVALID_DEVICE)
Stacktrace:
 [1] macro expansion at /users/omlins/.julia/packages/CUDAdrv/LC5XS/src/base.jl:147 [inlined]
 [2] CUDAdrv.CuContext(::CUDAdrv.CuDevice, ::CUDAdrv.CUctx_flags) at /users/omlins/.julia/packages/CUDAdrv/LC5XS/src/context.jl:112
 [3] Type at /users/omlins/.julia/packages/CUDAdrv/LC5XS/src/context.jl:111 [inlined]
 [4] device!(::CUDAdrv.CuDevice) at /users/omlins/.julia/packages/CUDAnative/AGfq2/src/init.jl:68
 [5] initialize at /users/omlins/.julia/packages/CUDAnative/AGfq2/src/init.jl:39 [inlined] (repeats 2 times)
 [6] macro expansion at ./logging.jl:305 [inlined]
 [7] maybe_initialize(::Symbol) at /users/omlins/.julia/packages/CUDAnative/AGfq2/src/init.jl:31
 [8] macro expansion at /users/omlins/.julia/packages/CUDAdrv/LC5XS/src/base.jl:140 [inlined]
 [9] #alloc#3(::CUDAdrv.Mem.CUmem_attach, ::Function, ::Int64, ::Bool) at /users/omlins/.julia/packages/CUDAdrv/LC5XS/src/memory.jl:161
 [10] alloc at /users/omlins/.julia/packages/CUDAdrv/LC5XS/src/memory.jl:157 [inlined] (repeats 2 times)
 [11] macro expansion at /users/omlins/.julia/packages/CuArrays/f4Eke/src/memory.jl:237 [inlined]
 [12] macro expansion at ./util.jl:213 [inlined]
 [13] (::getfield(CuArrays, Symbol("##17#18")){Base.RefValue{CUDAdrv.Mem.Buffer}})() at /users/omlins/.julia/packages/CuArrays/f4Eke/src/memory.jl:236
 [14] lock(::getfield(CuArrays, Symbol("##17#18")){Base.RefValue{CUDAdrv.Mem.Buffer}}, ::ReentrantLock) at ./lock.jl:101
 [15] macro expansion at ./util.jl:213 [inlined]
 [16] alloc(::Int64) at /users/omlins/.julia/packages/CuArrays/f4Eke/src/memory.jl:221
 [17] CuArray{Float64,1}(::Tuple{Int64}) at /users/omlins/.julia/packages/CuArrays/f4Eke/src/array.jl:45
 [18] convert at /users/omlins/.julia/packages/CuArrays/f4Eke/src/array.jl:99 [inlined]
 [19] convert at /users/omlins/.julia/packages/CuArrays/f4Eke/src/array.jl:105 [inlined]
 [20] CuArray(::Array{Float64,1}) at /users/omlins/.julia/packages/GPUArrays/hzyWn/src/construction.jl:36
 [21] top-level scope at none:0

Note that there is only 1 gpu available (which obviously has as ‘0’ as device ID):

> nvidia-smi
Fri Dec 14 11:51:25 2018       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.44                 Driver Version: 396.44                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla P100-PCIE...  On   | 00000000:02:00.0 Off |                    0 |
| N/A   30C    P0    29W / 250W |      0MiB / 16280MiB |      0%   E. Process |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

The installed cuda version is 9.1.

Thanks for any help!

Sam

That’s weird. What happens if you do CUDAnative.device!(0) first? What’s the output of CUDAdrv.CuDevice(0) (after importing the relevant packages of course)?

Thanks for your immediate reply @maleadt! Calling CUDAnative.device!(0) gives the same error:

julia> using CUDAnative, CUDAdrv, CuArrays

julia> CUDAnative.device!(0)
ERROR: CUDA error: invalid device ordinal (code #101, ERROR_INVALID_DEVICE)
Stacktrace:
 [1] macro expansion at /users/omlins/.julia/packages/CUDAdrv/LC5XS/src/base.jl:147 [inlined]
 [2] CuContext(::CuDevice, ::CUDAdrv.CUctx_flags) at /users/omlins/.julia/packages/CUDAdrv/LC5XS/src/context.jl:112
 [3] Type at /users/omlins/.julia/packages/CUDAdrv/LC5XS/src/context.jl:111 [inlined]
 [4] device!(::CuDevice) at /users/omlins/.julia/packages/CUDAnative/AGfq2/src/init.jl:68
 [5] device!(::Int64) at /users/omlins/.julia/packages/CUDAnative/AGfq2/src/init.jl:75
 [6] top-level scope at none:0

However, for some reason, calling CUDAdrv.CuDevice(0) solves the problem for the call to CuArrays.CuArray() (but not for the call to CUDAnative.device!()):

julia> using CUDAnative, CUDAdrv, CuArrays

julia> CUDAdrv.CuDevice(0)
CuDevice(0): Tesla P100-PCIE-16GB

julia> CUDAnative.device!(0)
ERROR: CUDA error: invalid device ordinal (code #101, ERROR_INVALID_DEVICE)
Stacktrace:
 [1] macro expansion at /users/omlins/.julia/packages/CUDAdrv/LC5XS/src/base.jl:147 [inlined]
 [2] CuContext(::CuDevice, ::CUDAdrv.CUctx_flags) at /users/omlins/.julia/packages/CUDAdrv/LC5XS/src/context.jl:112
 [3] Type at /users/omlins/.julia/packages/CUDAdrv/LC5XS/src/context.jl:111 [inlined]
 [4] device!(::CuDevice) at /users/omlins/.julia/packages/CUDAnative/AGfq2/src/init.jl:68
 [5] device!(::Int64) at /users/omlins/.julia/packages/CUDAnative/AGfq2/src/init.jl:75
 [6] top-level scope at none:0

julia> B = ones(10_000) ./ 2;

julia> cuB = CuArrays.CuArray(B);

julia> 

Actually, it is sufficient to call CuArray() twice. From the second call on CuArray() works:

julia> using CuArrays

julia> B = ones(10_000) ./ 2;

julia> A = ones(10_000) .* π;

julia> cuB = CuArray(B);
ERROR: CUDA error: invalid device ordinal (code #101, ERROR_INVALID_DEVICE)
Stacktrace:
 [1] macro expansion at /users/omlins/.julia/packages/CUDAdrv/LC5XS/src/base.jl:147 [inlined]
 [2] CUDAdrv.CuContext(::CUDAdrv.CuDevice, ::CUDAdrv.CUctx_flags) at /users/omlins/.julia/packages/CUDAdrv/LC5XS/src/context.jl:112
 [3] Type at /users/omlins/.julia/packages/CUDAdrv/LC5XS/src/context.jl:111 [inlined]
 [4] device!(::CUDAdrv.CuDevice) at /users/omlins/.julia/packages/CUDAnative/AGfq2/src/init.jl:68
 [5] initialize at /users/omlins/.julia/packages/CUDAnative/AGfq2/src/init.jl:39 [inlined] (repeats 2 times)
 [6] macro expansion at ./logging.jl:305 [inlined]
 [7] maybe_initialize(::Symbol) at /users/omlins/.julia/packages/CUDAnative/AGfq2/src/init.jl:31
 [8] macro expansion at /users/omlins/.julia/packages/CUDAdrv/LC5XS/src/base.jl:140 [inlined]
 [9] #alloc#3(::CUDAdrv.Mem.CUmem_attach, ::Function, ::Int64, ::Bool) at /users/omlins/.julia/packages/CUDAdrv/LC5XS/src/memory.jl:161
 [10] alloc at /users/omlins/.julia/packages/CUDAdrv/LC5XS/src/memory.jl:157 [inlined] (repeats 2 times)
 [11] macro expansion at /users/omlins/.julia/packages/CuArrays/f4Eke/src/memory.jl:237 [inlined]
 [12] macro expansion at ./util.jl:213 [inlined]
 [13] (::getfield(CuArrays, Symbol("##17#18")){Base.RefValue{CUDAdrv.Mem.Buffer}})() at /users/omlins/.julia/packages/CuArrays/f4Eke/src/memory.jl:236
 [14] lock(::getfield(CuArrays, Symbol("##17#18")){Base.RefValue{CUDAdrv.Mem.Buffer}}, ::ReentrantLock) at ./lock.jl:101
 [15] macro expansion at ./util.jl:213 [inlined]
 [16] alloc(::Int64) at /users/omlins/.julia/packages/CuArrays/f4Eke/src/memory.jl:221
 [17] CuArray{Float64,1}(::Tuple{Int64}) at /users/omlins/.julia/packages/CuArrays/f4Eke/src/array.jl:45
 [18] convert at /users/omlins/.julia/packages/CuArrays/f4Eke/src/array.jl:99 [inlined]
 [19] convert at /users/omlins/.julia/packages/CuArrays/f4Eke/src/array.jl:105 [inlined]
 [20] CuArray(::Array{Float64,1}) at /users/omlins/.julia/packages/GPUArrays/hzyWn/src/construction.jl:36
 [21] top-level scope at none:0

julia> cuB = CuArray(B);

julia> cuA = CuArray(A);

julia> cuC = 2 .* cuA ./ cuB;


julia> 

julia> all(cuC .≈ 4*π);

julia> all(cuC .≈ 4*π)
true

julia> typeof(cuC)
CuArray{Float64,1}

julia> 

It looks like somehow some (lazy) initialization is triggered too late.

Why do you think that? It’s the initialization itself that is failing, so I’m not sure what’s up.

Could you try using CUDAdrv. CuContext(CuDevice(0)) (in a clean session, without loading CUDAnative or CuArrays)? Does that consistently fail?

Why do you think that? It’s the initialization itself that is failing, so I’m not sure what’s up.

Well, due the fact that after a first failing call to CuArray() everything seems to work as it should, I thought that the required initialization is triggered and successful, but occurs somehow too late. But as I do not know at all how it is implemented, I don’t think I can be of any help in reasoning about why the error occurs. I can however do little tests that you tell me to do in order to track down the source of the error :slight_smile:

Here is the test you asked for:

julia> using CUDAdrv

julia> CuContext(CuDevice(0))
CuContext(Ptr{Nothing} @0x0000000000e75c90, true, true)

julia> CuContext(CuDevice(0))
ERROR: CUDA error: invalid device ordinal (code #101, ERROR_INVALID_DEVICE)
Stacktrace:
 [1] macro expansion at /users/omlins/.julia/packages/CUDAdrv/LC5XS/src/base.jl:147 [inlined]
 [2] CuContext(::CuDevice, ::CUDAdrv.CUctx_flags) at /users/omlins/.julia/packages/CUDAdrv/LC5XS/src/context.jl:112
 [3] CuContext(::CuDevice) at /users/omlins/.julia/packages/CUDAdrv/LC5XS/src/context.jl:111
 [4] top-level scope at none:0

julia> CuContext(CuDevice(0))
ERROR: CUDA error: invalid device ordinal (code #101, ERROR_INVALID_DEVICE)
Stacktrace:
 [1] macro expansion at /users/omlins/.julia/packages/CUDAdrv/LC5XS/src/base.jl:147 [inlined]
 [2] CuContext(::CuDevice, ::CUDAdrv.CUctx_flags) at /users/omlins/.julia/packages/CUDAdrv/LC5XS/src/context.jl:112
 [3] CuContext(::CuDevice) at /users/omlins/.julia/packages/CUDAdrv/LC5XS/src/context.jl:111
 [4] top-level scope at none:0

That’s is really weird. Something seems up with your GPU/set-up. Could try running that code under cuda-memcheck and see if the error comes from another, hidden API call (because context creation can return error codes from other API calls). It really should not behave like that, and if it does, I’m not sure I can do anything about it.

Here is the output from running with cuda-memcheck:

> cat test.jl
using CUDAdrv

CuContext(CuDevice(0))
CuContext(CuDevice(0))

> cuda-memcheck julia test.jl 
========= CUDA-MEMCHECK
========= Program hit CUDA_ERROR_INVALID_DEVICE (error 101) due to "invalid device ordinal" on CUDA API call to cuCtxCreate_v2. 
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/opt/cray/nvidia/default/lib64/libcuda.so (cuCtxCreate_v2 + 0x168) [0x2286e8]
=========     Host Frame:[0x2aaac3e6b450]
=========     Host Frame:[0x2aaac3e6a2b9]
=========     Host Frame:[0x2aaac3e6a316]
=========     Host Frame:/users/omlins/julia/julia-1.0.2/bin/../lib/libjulia.so.1 (jl_apply_generic + 0x136) [0x48176]
=========     Host Frame:/users/omlins/julia/julia-1.0.2/bin/../lib/libjulia.so.1 [0x1b1740]
=========     Host Frame:/users/omlins/julia/julia-1.0.2/bin/../lib/libjulia.so.1 [0x1b1469]
=========     Host Frame:/users/omlins/julia/julia-1.0.2/bin/../lib/libjulia.so.1 [0x1b1dec]
=========     Host Frame:/users/omlins/julia/julia-1.0.2/bin/../lib/libjulia.so.1 [0x1b256f]
=========     Host Frame:/users/omlins/julia/julia-1.0.2/bin/../lib/libjulia.so.1 [0x5e5ec]
=========     Host Frame:/users/omlins/julia/julia-1.0.2/bin/../lib/libjulia.so.1 [0x1b303d]
=========     Host Frame:/users/omlins/julia/julia-1.0.2/bin/../lib/libjulia.so.1 [0x7dd9c]
=========     Host Frame:/users/omlins/julia/julia-1.0.2/bin/../lib/libjulia.so.1 [0x5266e]
=========     Host Frame:/users/omlins/julia/julia-1.0.2/bin/../lib/libjulia.so.1 (jl_load + 0x53) [0x7f0e3]
=========     Host Frame:/users/omlins/julia/julia-1.0.2/lib/julia/sys.so [0xc1eee6]
=========     Host Frame:/users/omlins/julia/julia-1.0.2/lib/julia/sys.so [0x6aa89d]
=========     Host Frame:/users/omlins/julia/julia-1.0.2/bin/../lib/libjulia.so.1 (jl_apply_generic + 0x136) [0x48176]
=========     Host Frame:/users/omlins/julia/julia-1.0.2/lib/julia/sys.so [0xc27edb]
=========     Host Frame:/users/omlins/julia/julia-1.0.2/lib/julia/sys.so [0x6aa59d]
=========     Host Frame:/users/omlins/julia/julia-1.0.2/bin/../lib/libjulia.so.1 (jl_apply_generic + 0x136) [0x48176]
=========     Host Frame:julia [0x1ae9]
=========     Host Frame:julia [0x1514]
=========     Host Frame:/lib64/libc.so.6 (__libc_start_main + 0xf5) [0x20725]
=========     Host Frame:julia [0x15b5]
=========
ERROR: LoadError: CUDA error: invalid device ordinal (code #101, ERROR_INVALID_DEVICE)
Stacktrace:
 [1] macro expansion at /users/omlins/.julia/packages/CUDAdrv/LC5XS/src/base.jl:147 [inlined]
 [2] CuContext(::CuDevice, ::CUDAdrv.CUctx_flags) at /users/omlins/.julia/packages/CUDAdrv/LC5XS/src/context.jl:112
 [3] CuContext(::CuDevice) at /users/omlins/.julia/packages/CUDAdrv/LC5XS/src/context.jl:111
 [4] top-level scope at none:0
 [5] include at ./boot.jl:317 [inlined]
 [6] include_relative(::Module, ::String) at ./loading.jl:1044
 [7] include(::Module, ::String) at ./sysimg.jl:29
 [8] exec_options(::Base.JLOptions) at ./client.jl:231
 [9] _start() at ./client.jl:425
in expression starting at /users/omlins/0_misc/hpc_julia/0_invalid_device_error_debug/test.jl:4
========= ERROR SUMMARY: 1 error

Nothing new there, the memcheck trace corresponds with our stack trace.

I see your libcuda is in /opt/cray, are you by any chance using some exotic system? Virtualized GPU? Are you sure libcuda matches the loaded kernel driver? Normally libcuda is in /lib because it is tied very strongly to your kernel and its modules.

Again, this API behavior is highly unusual and not documented, so I’m suspecting a deeper issue here.

I see your libcuda is in /opt/cray, are you by any chance using some exotic system? Virtualized GPU?

It is a CRAY system; the programming environment including CUDA is provided by CRAY. No, the GPUs are not virtualized.

Are you sure libcuda matches the loaded kernel driver?

It looks like it is right; libcuda.so links to libnvidia-fatbinaryloader.so.396.44, where 396.44 corresponds to the CUDA driver version (see above in the topic description in the nvidia-smi output):

> ldd /opt/cray/nvidia/default/lib64/libcuda.so
	(...)
	libnvidia-fatbinaryloader.so.396.44 => /opt/cray/nvidia/default/lib64/libnvidia-fatbinaryloader.so.396.44 (0x00002aaaac915000)
	(...)

Here is a little correction to my post from Dec 14, 12:28 above:
I noted that

calling CUDAdrv.CuDevice(0) solves the problem for the call to CuArrays.CuArray() (but not for the call to CUDAnative.device!())

- Unfortunately this is not true. It was the precedent (failing!) call to CUDAnative.device!(0) that made the call to CuArrays.CuArray() succeed. As you can see, CuArrays.CuArray() fails if only CUDAdrv.CuDevice(0) is called first:

julia> using CUDAnative, CUDAdrv, CuArrays

julia> CUDAdrv.CuDevice(0)
CuDevice(0): Tesla P100-PCIE-16GB

julia> B = ones(10_000) ./ 2;

julia> cuB = CuArrays.CuArray(B);

ERROR: CUDA error: invalid device ordinal (code #101, ERROR_INVALID_DEVICE)
(...)

However, CuArrays.CuArray() succeeds if CUDAnative.device!(0) is called first, even though this call itself fails persistently!:

julia> using CUDAnative, CUDAdrv, CuArrays

julia> CUDAnative.device!(0)
ERROR: CUDA error: invalid device ordinal (code #101, ERROR_INVALID_DEVICE)
(...)

julia> B = ones(10_000) ./ 2;

julia> cuB = CuArrays.CuArray(B);

julia> CUDAnative.device!(0)
ERROR: CUDA error: invalid device ordinal (code #101, ERROR_INVALID_DEVICE)
(...)

To sum up, so far there are two ways to make a call to CuArrays.CuArray() succeed:

  • call first CUDAnative.device!(0) once
  • call first CuArrays.CuArray() itself once

Unfortunately, none of these two ways are good workarounds as both calls cause an error, which makes a script abort. @maleadt: would you have an idea for a different possible workaround which would not cause an error?

Thanks in advance!

Sam

I’m getting similar errors when running on a cluster with a gpu allocated.

Below is the output of some of the stuff suggested before.

Happy to help more with debugging this.

[amanela@dl01 char-rnn]$ julia
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.0.1 (2018-09-29)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

(v1.0) pkg> activate cuda

(cuda) pkg> st
    Status `~/gitspace/model-zoo/text/char-rnn/cuda/Project.toml`
  [c5f51814] CUDAdrv v0.8.6
  [be33ccc6] CUDAnative v0.9.1
  [3a865a2d] CuArrays v0.8.1
  [587475ba] Flux v0.6.10
  [2913bbd2] StatsBase v0.27.0

julia> using CUDAdrv

julia> CuContext(CuDevice(0))
CuContext(Ptr{Nothing} @0x0000000001d24c80, true, true)

julia> CuContext(CuDevice(0))
ERROR: CUDA error: invalid device ordinal (code #101, ERROR_INVALID_DEVICE)
Stacktrace:
 [1] macro expansion at /home/amanela/.julia/packages/CUDAdrv/LC5XS/src/base.jl:147 [inlined]
 [2] CuContext(::CuDevice, ::CUDAdrv.CUctx_flags) at /home/amanela/.julia/packages/CUDAdrv/LC5XS/src/context.jl:112
 [3] CuContext(::CuDevice) at /home/amanela/.julia/packages/CUDAdrv/LC5XS/src/context.jl:111
 [4] top-level scope at none:0

julia> CuContext(CuDevice(0))
ERROR: CUDA error: invalid device ordinal (code #101, ERROR_INVALID_DEVICE)
Stacktrace:
 [1] macro expansion at /home/amanela/.julia/packages/CUDAdrv/LC5XS/src/base.jl:147 [inlined]
 [2] CuContext(::CuDevice, ::CUDAdrv.CUctx_flags) at /home/amanela/.julia/packages/CUDAdrv/LC5XS/src/context.jl:112
 [3] CuContext(::CuDevice) at /home/amanela/.julia/packages/CUDAdrv/LC5XS/src/context.jl:111
 [4] top-level scope at none:0

julia> using CUDAnative, CuArrays
ERROR: InitError: CUDA error: invalid device ordinal (code #101, ERROR_INVALID_DEVICE)
Stacktrace:
 [1] macro expansion at /home/amanela/.julia/packages/CUDAdrv/LC5XS/src/base.jl:147 [inlined]
 [2] CuContext(::CuDevice, ::CUDAdrv.CUctx_flags) at /home/amanela/.julia/packages/CUDAdrv/LC5XS/src/context.jl:112
 [3] Type at /home/amanela/.julia/packages/CUDAdrv/LC5XS/src/context.jl:111 [inlined]
 [4] device!(::CuDevice) at /home/amanela/.julia/packages/CUDAnative/AGfq2/src/init.jl:68
 [5] initialize at /home/amanela/.julia/packages/CUDAnative/AGfq2/src/init.jl:39 [inlined] (repeats 2 times)
 [6] macro expansion at ./logging.jl:307 [inlined]
 [7] maybe_initialize(::Symbol) at /home/amanela/.julia/packages/CUDAnative/AGfq2/src/init.jl:31
 [8] macro expansion at /home/amanela/.julia/packages/CUDAdrv/LC5XS/src/base.jl:140 [inlined]
 [9] device() at /home/amanela/.julia/packages/CUDAdrv/LC5XS/src/context.jl:172
 [10] device at /home/amanela/.julia/packages/CUDAdrv/LC5XS/src/context.jl:168 [inlined]
 [11] __init__() at /home/amanela/.julia/packages/CuArrays/f4Eke/src/blas/CUBLAS.jl:39
 [12] _include_from_serialized(::String, ::Array{Any,1}) at ./loading.jl:630
 [13] macro expansion at ./logging.jl:312 [inlined]
 [14] _require_search_from_serialized(::Base.PkgId, ::String) at ./loading.jl:701
 [15] _require(::Base.PkgId) at ./loading.jl:934
 [16] require(::Base.PkgId) at ./loading.jl:855
 [17] macro expansion at ./logging.jl:311 [inlined]
 [18] require(::Module, ::Symbol) at ./loading.jl:837
during initialization of module CUBLAS

julia> CUDAdrv.CuDevice(0)
CuDevice(0): Tesla V100-PCIE-16GB

julia> CUDAnative.device!(0)
ERROR: CUDA error: invalid device ordinal (code #101, ERROR_INVALID_DEVICE)
Stacktrace:
 [1] macro expansion at /home/amanela/.julia/packages/CUDAdrv/LC5XS/src/base.jl:147 [inlined]
 [2] CuContext(::CuDevice, ::CUDAdrv.CUctx_flags) at /home/amanela/.julia/packages/CUDAdrv/LC5XS/src/context.jl:112
 [3] Type at /home/amanela/.julia/packages/CUDAdrv/LC5XS/src/context.jl:111 [inlined]
 [4] device!(::CuDevice) at /home/amanela/.julia/packages/CUDAnative/AGfq2/src/init.jl:68
 [5] device!(::Int64) at /home/amanela/.julia/packages/CUDAnative/AGfq2/src/init.jl:75
 [6] top-level scope at none:0

shell> nvidia-smi
Sun Dec 30 15:02:22 2018       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.48                 Driver Version: 410.48                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla V100-PCIE...  Off  | 00000000:DA:00.0 Off |                    0 |
| N/A   26C    P0    33W / 250W |    418MiB / 16130MiB |      0%   E. Process |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0    127639      C   julia                                        407MiB |
+-----------------------------------------------------------------------------+

julia> ENV["LD_LIBRARY_PATH"]
"/export/cuda-10.0/lib64:/export/src/julia-1.0.1/lib/julia:/export/src/julia-1.0.1/lib"

julia> ENV["PATH"]
"/export/cuda-10.0/bin:/export/src/julia-1.0.1/bin:/usr/local/bin:/usr/local/sbin:/usr/lib64/qt-3.3/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/opt/cuda/bin:/act/bin:/home/amanela/bin"

julia> 

Just a short additional remark: I have been experimenting with CUDAnative lately and have not run into any problems with that so far. So I believe that the problem is specific to CuArrays.

After looking into this a little, I’m thinking you guys are using a GPU in process exclusive mode, where only a single context can get created. Please verify with:

$ nvidia-smi -q | grep "Compute Mode"
    Compute Mode                    : Exclusive_Process

That means the call to CuContext(CuDevice(0)) is expected to fail the second time, nothing wrong there.

However, going back to the original problem of @samo, CuArrays should only ever create a single context, and cache it afterwards. I can’t seem to reproduce the issue, although I found a redundant context initialization that might cause some issues. Could you guys try https://github.com/JuliaGPU/CUDAnative.jl/pull/312?

EDIT: if that doesn’t work, please re-run failing code with the following patch applied:

CUDAdrv$ git diff
diff --git a/src/context.jl b/src/context.jl
index 3d7be17..a8f69ef 100644
--- a/src/context.jl
+++ b/src/context.jl
@@ -108,6 +108,9 @@ Base.deepcopy_internal(::CuContext, ::IdDict) =
     error("CuContext cannot be copied")
 
 function CuContext(dev::CuDevice, flags::CUctx_flags=SCHED_AUTO)
+    @info "Creating a CuContext" dev flags
+    Base.show_backtrace(stderr, stacktrace())
+    println(stderr)
     handle_ref = Ref{CuContext_t}()
     @apicall(:cuCtxCreate, (Ptr{CuContext_t}, Cuint, Cint),
                            handle_ref, flags, dev)

You are right about the process exclusive mode. I get:

> nvidia-smi -q | grep "Compute Mode"
    Compute Mode                    : Exclusive_Process

I will try out the remaining things you noted…

@maleadt: how do I install the version from your PR in order to try it?

You can pkg> add CUDAnative#tb/version_init, but that branch might be incompatible with the rest of your set-up (specifically, if you are working with Flux). You’d need to upgrade CuArrays to master as well pkg> add CuArrays#master. Alternatively, just apply the fix, it’s a singly-line change.

@maleadt: Sorry for the basic Julia-newcomer questions: to do the “single line change”, I modify
.julia/packages/CUDAnative/AGfq2/src/init.jl as in your PR and then do Pkg.build(“CUDAnative”)?

Yeah, no need to rebuild even.

Adding the line

:cuDriverGetVersion,

to .julia/packages/CUDAnative/AGfq2/src/init.jl
didn’t solve the problem.

So, I applied your patch and run the code again. Here is the output, showing first the modified file content:

> julia

shell> grep -3 "cuDriverGetVersion" .julia/packages/CUDAnative/AGfq2/src/init.jl
#
# feel free to open a PR adding additional API calls, if you have a specific use for them.
const preinit_apicalls = Set{Symbol}([
    :cuDriverGetVersion,
    # device calls, commonly used to determine the most appropriate device
    :cuDeviceGet,
    :cuDeviceGetAttribute,

shell> grep -3 "Creating a CuContext" .julia/packages/CUDAdrv/LC5XS/src/context.jl
    error("CuContext cannot be copied")

function CuContext(dev::CuDevice, flags::CUctx_flags=SCHED_AUTO)
    @info "Creating a CuContext" dev flags
    Base.show_backtrace(stderr, stacktrace())
    println(stderr)
    handle_ref = Ref{CuContext_t}()

julia> using CUDAnative, CUDAdrv, CuArrays

julia> B = ones(10_000) ./ 2;

julia> cuB = CuArrays.CuArray(B);
┌ Info: Creating a CuContext
│   dev = CuDevice(0): Tesla P100-PCIE-16GB
└   flags = SCHED_AUTO::CUctx_flags = 0

Stacktrace:
 [1] macro expansion at ./logging.jl:305 [inlined]
 [2] CuContext(::CuDevice, ::CUDAdrv.CUctx_flags) at /users/omlins/.julia/packages/CUDAdrv/LC5XS/src/context.jl:111 (repeats 2 times)
 [3] device!(::CuDevice) at /users/omlins/.julia/packages/CUDAnative/AGfq2/src/init.jl:69
 [4] initialize at /users/omlins/.julia/packages/CUDAnative/AGfq2/src/init.jl:40 [inlined] (repeats 2 times)
 [5] macro expansion at ./logging.jl:305 [inlined]
 [6] maybe_initialize(::Symbol) at /users/omlins/.julia/packages/CUDAnative/AGfq2/src/init.jl:32
 [7] macro expansion at /users/omlins/.julia/packages/CUDAdrv/LC5XS/src/base.jl:140 [inlined]
 [8] #alloc#3(::CUDAdrv.Mem.CUmem_attach, ::Function, ::Int64, ::Bool) at /users/omlins/.julia/packages/CUDAdrv/LC5XS/src/memory.jl:161
 [9] alloc at /users/omlins/.julia/packages/CUDAdrv/LC5XS/src/memory.jl:157 [inlined] (repeats 2 times)
 [10] macro expansion at /users/omlins/.julia/packages/CuArrays/f4Eke/src/memory.jl:237 [inlined]
 [11] macro expansion at ./util.jl:213 [inlined]
 [12] (::getfield(CuArrays, Symbol("##17#18")){Base.RefValue{CUDAdrv.Mem.Buffer}})() at /users/omlins/.julia/packages/CuArrays/f4Eke/src/memory.jl:236
 [13] lock(::getfield(CuArrays, Symbol("##17#18")){Base.RefValue{CUDAdrv.Mem.Buffer}}, ::ReentrantLock) at ./lock.jl:101
 [14] macro expansion at ./util.jl:213 [inlined]
 [15] alloc(::Int64) at /users/omlins/.julia/packages/CuArrays/f4Eke/src/memory.jl:221
 [16] CuArrays.CuArray{Float64,1}(::Tuple{Int64}) at /users/omlins/.julia/packages/CuArrays/f4Eke/src/array.jl:45
 [17] convert at /users/omlins/.julia/packages/CuArrays/f4Eke/src/array.jl:99 [inlined]
 [18] convert at /users/omlins/.julia/packages/CuArrays/f4Eke/src/array.jl:105 [inlined]
 [19] CuArrays.CuArray(::Array{Float64,1}) at /users/omlins/.julia/packages/GPUArrays/hzyWn/src/construction.jl:36
 [20] top-level scope at none:0
 [21] eval(::Module, ::Any) at ./boot.jl:319
 [22] eval_user_input(::Any, ::REPL.REPLBackend) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.0/REPL/src/REPL.jl:85
 [23] macro expansion at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.0/REPL/src/REPL.jl:117 [inlined]
 [24] (::getfield(REPL, Symbol("##28#29")){REPL.REPLBackend})() at ./task.jl:259
ERROR: CUDA error: invalid device ordinal (code #101, ERROR_INVALID_DEVICE)
Stacktrace:
 [1] macro expansion at ./logging.jl:315 [inlined]
 [2] CuContext(::CuDevice, ::CUDAdrv.CUctx_flags) at /users/omlins/.julia/packages/CUDAdrv/LC5XS/src/context.jl:111 (repeats 2 times)
 [3] device!(::CuDevice) at /users/omlins/.julia/packages/CUDAnative/AGfq2/src/init.jl:69
 [4] initialize at /users/omlins/.julia/packages/CUDAnative/AGfq2/src/init.jl:40 [inlined] (repeats 2 times)
 [5] macro expansion at ./logging.jl:305 [inlined]
 [6] maybe_initialize(::Symbol) at /users/omlins/.julia/packages/CUDAnative/AGfq2/src/init.jl:32
 [7] macro expansion at /users/omlins/.julia/packages/CUDAdrv/LC5XS/src/base.jl:140 [inlined]
 [8] #alloc#3(::CUDAdrv.Mem.CUmem_attach, ::Function, ::Int64, ::Bool) at /users/omlins/.julia/packages/CUDAdrv/LC5XS/src/memory.jl:161
 [9] alloc at /users/omlins/.julia/packages/CUDAdrv/LC5XS/src/memory.jl:157 [inlined] (repeats 2 times)
 [10] macro expansion at /users/omlins/.julia/packages/CuArrays/f4Eke/src/memory.jl:237 [inlined]
 [11] macro expansion at ./util.jl:213 [inlined]
 [12] (::getfield(CuArrays, Symbol("##17#18")){Base.RefValue{CUDAdrv.Mem.Buffer}})() at /users/omlins/.julia/packages/CuArrays/f4Eke/src/memory.jl:236
 [13] lock(::getfield(CuArrays, Symbol("##17#18")){Base.RefValue{CUDAdrv.Mem.Buffer}}, ::ReentrantLock) at ./lock.jl:101
 [14] macro expansion at ./util.jl:213 [inlined]
 [15] alloc(::Int64) at /users/omlins/.julia/packages/CuArrays/f4Eke/src/memory.jl:221
 [16] CuArrays.CuArray{Float64,1}(::Tuple{Int64}) at /users/omlins/.julia/packages/CuArrays/f4Eke/src/array.jl:45
 [17] convert at /users/omlins/.julia/packages/CuArrays/f4Eke/src/array.jl:99 [inlined]
 [18] convert at /users/omlins/.julia/packages/CuArrays/f4Eke/src/array.jl:105 [inlined]
 [19] CuArrays.CuArray(::Array{Float64,1}) at /users/omlins/.julia/packages/GPUArrays/hzyWn/src/construction.jl:36
 [20] top-level scope at none:0

julia>