CuArrays: error calling CuArray() (ERROR_INVALID_DEVICE)

You didn’t execute all code. Do subsequent operations still work, and show a successful context construction? This really doesn’t make sense, are you sure there’s no other application using the GPU in the background (try shell> nvidia-smi from within the Julia session)?

@maleadt: I guess with “all code” you mean e.g. what I exectued in post 4 above. Here you see the result of its execution:

>julia

shell> grep -3 "cuDriverGetVersion" .julia/packages/CUDAnative/AGfq2/src/init.jl
#
# feel free to open a PR adding additional API calls, if you have a specific use for them.
const preinit_apicalls = Set{Symbol}([
    :cuDriverGetVersion,
    # device calls, commonly used to determine the most appropriate device
    :cuDeviceGet,
    :cuDeviceGetAttribute,

shell> grep -3 "Creating a CuContext" .julia/packages/CUDAdrv/LC5XS/src/context.jl
    error("CuContext cannot be copied")

function CuContext(dev::CuDevice, flags::CUctx_flags=SCHED_AUTO)
    @info "Creating a CuContext" dev flags
    Base.show_backtrace(stderr, stacktrace())
    println(stderr)
    handle_ref = Ref{CuContext_t}()

julia> using CuArrays

julia> B = ones(10_000) ./ 2;

julia> A = ones(10_000) .* π;

julia> cuB = CuArray(B);
┌ Info: Creating a CuContext
│   dev = CuDevice(0): Tesla P100-PCIE-16GB
└   flags = SCHED_AUTO::CUctx_flags = 0

Stacktrace:
 [1] macro expansion at ./logging.jl:305 [inlined]
 [2] CUDAdrv.CuContext(::CUDAdrv.CuDevice, ::CUDAdrv.CUctx_flags) at /users/omlins/.julia/packages/CUDAdrv/LC5XS/src/context.jl:111 (repeats 2 times)
 [3] device!(::CUDAdrv.CuDevice) at /users/omlins/.julia/packages/CUDAnative/AGfq2/src/init.jl:69
 [4] initialize at /users/omlins/.julia/packages/CUDAnative/AGfq2/src/init.jl:40 [inlined] (repeats 2 times)
 [5] macro expansion at ./logging.jl:305 [inlined]
 [6] maybe_initialize(::Symbol) at /users/omlins/.julia/packages/CUDAnative/AGfq2/src/init.jl:32
 [7] macro expansion at /users/omlins/.julia/packages/CUDAdrv/LC5XS/src/base.jl:140 [inlined]
 [8] #alloc#3(::CUDAdrv.Mem.CUmem_attach, ::Function, ::Int64, ::Bool) at /users/omlins/.julia/packages/CUDAdrv/LC5XS/src/memory.jl:161
 [9] alloc at /users/omlins/.julia/packages/CUDAdrv/LC5XS/src/memory.jl:157 [inlined] (repeats 2 times)
 [10] macro expansion at /users/omlins/.julia/packages/CuArrays/f4Eke/src/memory.jl:237 [inlined]
 [11] macro expansion at ./util.jl:213 [inlined]
 [12] (::getfield(CuArrays, Symbol("##17#18")){Base.RefValue{CUDAdrv.Mem.Buffer}})() at /users/omlins/.julia/packages/CuArrays/f4Eke/src/memory.jl:236
 [13] lock(::getfield(CuArrays, Symbol("##17#18")){Base.RefValue{CUDAdrv.Mem.Buffer}}, ::ReentrantLock) at ./lock.jl:101
 [14] macro expansion at ./util.jl:213 [inlined]
 [15] alloc(::Int64) at /users/omlins/.julia/packages/CuArrays/f4Eke/src/memory.jl:221
 [16] CuArray{Float64,1}(::Tuple{Int64}) at /users/omlins/.julia/packages/CuArrays/f4Eke/src/array.jl:45
 [17] convert at /users/omlins/.julia/packages/CuArrays/f4Eke/src/array.jl:99 [inlined]
 [18] convert at /users/omlins/.julia/packages/CuArrays/f4Eke/src/array.jl:105 [inlined]
 [19] CuArray(::Array{Float64,1}) at /users/omlins/.julia/packages/GPUArrays/hzyWn/src/construction.jl:36
 [20] top-level scope at none:0
 [21] eval(::Module, ::Any) at ./boot.jl:319
 [22] eval_user_input(::Any, ::REPL.REPLBackend) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.0/REPL/src/REPL.jl:85
 [23] macro expansion at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.0/REPL/src/REPL.jl:117 [inlined]
 [24] (::getfield(REPL, Symbol("##28#29")){REPL.REPLBackend})() at ./task.jl:259
ERROR: CUDA error: invalid device ordinal (code #101, ERROR_INVALID_DEVICE)
Stacktrace:
 [1] macro expansion at ./logging.jl:315 [inlined]
 [2] CUDAdrv.CuContext(::CUDAdrv.CuDevice, ::CUDAdrv.CUctx_flags) at /users/omlins/.julia/packages/CUDAdrv/LC5XS/src/context.jl:111 (repeats 2 times)
 [3] device!(::CUDAdrv.CuDevice) at /users/omlins/.julia/packages/CUDAnative/AGfq2/src/init.jl:69
 [4] initialize at /users/omlins/.julia/packages/CUDAnative/AGfq2/src/init.jl:40 [inlined] (repeats 2 times)
 [5] macro expansion at ./logging.jl:305 [inlined]
 [6] maybe_initialize(::Symbol) at /users/omlins/.julia/packages/CUDAnative/AGfq2/src/init.jl:32
 [7] macro expansion at /users/omlins/.julia/packages/CUDAdrv/LC5XS/src/base.jl:140 [inlined]
 [8] #alloc#3(::CUDAdrv.Mem.CUmem_attach, ::Function, ::Int64, ::Bool) at /users/omlins/.julia/packages/CUDAdrv/LC5XS/src/memory.jl:161
 [9] alloc at /users/omlins/.julia/packages/CUDAdrv/LC5XS/src/memory.jl:157 [inlined] (repeats 2 times)
 [10] macro expansion at /users/omlins/.julia/packages/CuArrays/f4Eke/src/memory.jl:237 [inlined]
 [11] macro expansion at ./util.jl:213 [inlined]
 [12] (::getfield(CuArrays, Symbol("##17#18")){Base.RefValue{CUDAdrv.Mem.Buffer}})() at /users/omlins/.julia/packages/CuArrays/f4Eke/src/memory.jl:236
 [13] lock(::getfield(CuArrays, Symbol("##17#18")){Base.RefValue{CUDAdrv.Mem.Buffer}}, ::ReentrantLock) at ./lock.jl:101
 [14] macro expansion at ./util.jl:213 [inlined]
 [15] alloc(::Int64) at /users/omlins/.julia/packages/CuArrays/f4Eke/src/memory.jl:221
 [16] CuArray{Float64,1}(::Tuple{Int64}) at /users/omlins/.julia/packages/CuArrays/f4Eke/src/array.jl:45
 [17] convert at /users/omlins/.julia/packages/CuArrays/f4Eke/src/array.jl:99 [inlined]
 [18] convert at /users/omlins/.julia/packages/CuArrays/f4Eke/src/array.jl:105 [inlined]
 [19] CuArray(::Array{Float64,1}) at /users/omlins/.julia/packages/GPUArrays/hzyWn/src/construction.jl:36
 [20] top-level scope at none:0

julia> cuB = CuArray(B);

julia> cuA = CuArray(A);

julia> cuC = 2 .* cuA ./ cuB;

julia> all(cuC .≈ 4*π);

julia> all(cuC .≈ 4*π)
true

julia> typeof(cuC)
CuArray{Float64,1}

shell> nvidia-smi
Mon Jan  7 09:23:57 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.44                 Driver Version: 396.44                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla P100-PCIE...  On   | 00000000:02:00.0 Off |                    0 |
| N/A   28C    P0    30W / 250W |    299MiB / 16280MiB |      0%   E. Process |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0     17745      C   julia                                        289MiB |
+-----------------------------------------------------------------------------+

julia> 

You can see that there is only Julia running on the GPU. The compute node was allocated by SLURM with exlusive access and the above instructions are the only I executed. The behaviour when calling CuArray is as before besides the printing of the additional error information.

UPDATE: this problem has been solved meanwhile in the newer versions of CuArrays… thanks @maleadt!

1 Like

I’m still getting errors that appear related. Any ideas?

(Transformers) pkg> test CuArrays
   Testing CuArrays
 Resolving package versions...
Project Transformers v0.1.2
    Status `/tmp/jl_ZnrWvm/Manifest.toml`
  [621f4979] AbstractFFTs v0.5.0
  [79e6a3ab] Adapt v1.0.0
  [b99e7846] BinaryProvider v0.5.8
  [fa961155] CEnum v0.2.0
  [3895d2a7] CUDAapi v3.1.0
  [c5f51814] CUDAdrv v6.0.0
  [be33ccc6] CUDAnative v2.10.2
  [bbf7d656] CommonSubexpressions v0.2.0
  [3a865a2d] CuArrays v1.7.2
  [864edb3b] DataStructures v0.17.9
  [163ba53b] DiffResults v1.0.2
  [b552c78f] DiffRules v1.0.0
  [7a1cc6ca] FFTW v1.2.0
  [f5851436] FFTW_jll v3.3.9+3
  [1a297f60] FillArrays v0.8.4
  [f6369f11] ForwardDiff v0.10.9
  [0c68f7d7] GPUArrays v2.0.1
  [1d5cc7b8] IntelOpenMP_jll v2018.0.3+0
  [929cbde3] LLVM v1.3.3
  [856f044c] MKL_jll v2019.0.117+2
  [1914dd2f] MacroTools v0.5.3
  [872c559c] NNlib v0.6.4
  [77ba4419] NaNMath v0.3.3
  [efe28fd5] OpenSpecFun_jll v0.5.3+1
  [bac558e1] OrderedCollections v1.1.0
  [189a3867] Reexport v0.2.0
  [ae029012] Requires v0.5.2
  [276daf66] SpecialFunctions v0.9.0
  [90137ffa] StaticArrays v0.12.1
  [a759f4b9] TimerOutputs v0.5.3
  [2a0f44e3] Base64  [`@stdlib/Base64`]
  [ade2ca70] Dates  [`@stdlib/Dates`]
  [8ba89e20] Distributed  [`@stdlib/Distributed`]
  [b77e0a4c] InteractiveUtils  [`@stdlib/InteractiveUtils`]
  [76f85450] LibGit2  [`@stdlib/LibGit2`]
  [8f399da3] Libdl  [`@stdlib/Libdl`]
  [37e2e46d] LinearAlgebra  [`@stdlib/LinearAlgebra`]
  [56ddb016] Logging  [`@stdlib/Logging`]
  [d6f4376e] Markdown  [`@stdlib/Markdown`]
  [44cfe95a] Pkg  [`@stdlib/Pkg`]
  [de0858da] Printf  [`@stdlib/Printf`]
  [3fa0cd96] REPL  [`@stdlib/REPL`]
  [9a3f8284] Random  [`@stdlib/Random`]
  [ea8e919c] SHA  [`@stdlib/SHA`]
  [9e88b42a] Serialization  [`@stdlib/Serialization`]
  [6462fe0b] Sockets  [`@stdlib/Sockets`]
  [2f01184e] SparseArrays  [`@stdlib/SparseArrays`]
  [10745b16] Statistics  [`@stdlib/Statistics`]
  [8dfed614] Test  [`@stdlib/Test`]
  [cf7118a7] UUIDs  [`@stdlib/UUIDs`]
  [4ec0a83e] Unicode  [`@stdlib/Unicode`]
[ Info: CuArrays.jl failed to initialize and will be unavailable (set JULIA_CUDA_SILENT or JULIA_CUDA_VERBOSE to silence or expand this message)
ERROR: LoadError: CUDA error: invalid device ordinal (code 101, ERROR_INVALID_DEVICE)
Stacktrace:
 [1] throw_api_error(::CUDAdrv.cudaError_enum) at /home/amanela/.julia/packages/CUDAdrv/b1mvw/src/error.jl:131
 [2] macro expansion at /home/amanela/.julia/packages/CUDAdrv/b1mvw/src/error.jl:144 [inlined]
 [3] cuCtxCreate_v2 at /home/amanela/.julia/packages/CUDAdrv/b1mvw/src/libcuda.jl:108 [inlined]
 [4] CuContext(::CuDevice, ::CUDAdrv.CUctx_flags_enum) at /home/amanela/.julia/packages/CUDAdrv/b1mvw/src/context.jl:73
 [5] CuContext at /home/amanela/.julia/packages/CUDAdrv/b1mvw/src/context.jl:72 [inlined]
 [6] CuContext(::var"#5#7", ::CuDevice) at /home/amanela/.julia/packages/CUDAdrv/b1mvw/src/context.jl:118
 [7] iterate at ./none:0 [inlined]
 [8] collect(::Base.Generator{CUDAdrv.DeviceSet,var"#4#6"}) at ./array.jl:622
 [9] top-level scope at /home/amanela/.julia/packages/CuArrays/HE8G6/test/runtests.jl:27
 [10] include at ./boot.jl:328 [inlined]
 [11] include_relative(::Module, ::String) at ./loading.jl:1105
 [12] include(::Module, ::String) at ./Base.jl:31
 [13] include(::String) at ./client.jl:424
 [14] top-level scope at none:6
in expression starting at /home/amanela/.julia/packages/CuArrays/HE8G6/test/runtests.jl:27
ERROR: Package CuArrays errored during testing

julia> versioninfo()
Julia Version 1.3.1
Commit 2d5741174c (2019-12-30 21:36 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Xeon(R) Gold 6226 CPU @ 2.70GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, skylake)

shell> nvidia-smi -q

==============NVSMI LOG==============

Timestamp                           : Thu Feb 13 16:18:56 2020
Driver Version                      : 418.39
CUDA Version                        : 10.1

Attached GPUs                       : 1
GPU 00000000:3D:00.0
    Product Name                    : Tesla V100-PCIE-32GB
    Product Brand                   : Tesla
    Display Mode                    : Enabled
    Display Active                  : Disabled
    Persistence Mode                : Disabled
    Accounting Mode                 : Disabled
    Accounting Mode Buffer Size     : 4000
    Driver Model
        Current                     : N/A
        Pending                     : N/A
    Serial Number                   : 0561519008380
    GPU UUID                        : GPU-5b191013-a305-1cf6-9ca3-8c02b838eedc
    Minor Number                    : 0
    VBIOS Version                   : 88.00.7E.00.03
    MultiGPU Board                  : No
    Board ID                        : 0x3d00
    GPU Part Number                 : 900-2G500-0010-000
    Inforom Version
        Image Version               : G500.0202.00.02
        OEM Object                  : 1.1
        ECC Object                  : 5.0
        Power Management Object     : N/A
    GPU Operation Mode
        Current                     : N/A
        Pending                     : N/A
    GPU Virtualization Mode
        Virtualization mode         : None
    IBMNPU
        Relaxed Ordering Mode       : N/A
    PCI
        Bus                         : 0x3D
        Device                      : 0x00
        Domain                      : 0x0000
        Device Id                   : 0x1DB610DE
        Bus Id                      : 00000000:3D:00.0
        Sub System Id               : 0x124A10DE
        GPU Link Info
            PCIe Generation
                Max                 : 3
                Current             : 3
            Link Width
                Max                 : 16x
                Current             : 16x
        Bridge Chip
            Type                    : N/A
            Firmware                : N/A
        Replays Since Reset         : 0
        Replay Number Rollovers     : 0
        Tx Throughput               : 0 KB/s
        Rx Throughput               : 0 KB/s
    Fan Speed                       : N/A
    Performance State               : P0
    Clocks Throttle Reasons
        Idle                        : Not Active
        Applications Clocks Setting : Not Active
        SW Power Cap                : Not Active
        HW Slowdown                 : Not Active
            HW Thermal Slowdown     : Not Active
            HW Power Brake Slowdown : Not Active
        Sync Boost                  : Not Active
        SW Thermal Slowdown         : Not Active
        Display Clock Setting       : Not Active
    FB Memory Usage
        Total                       : 32480 MiB
        Used                        : 0 MiB
        Free                        : 32480 MiB
    BAR1 Memory Usage
        Total                       : 32768 MiB
        Used                        : 2 MiB
        Free                        : 32766 MiB
    Compute Mode                    : Exclusive_Process
    Utilization
        Gpu                         : 0 %
        Memory                      : 0 %
        Encoder                     : 0 %
        Decoder                     : 0 %
    Encoder Stats
        Active Sessions             : 0
        Average FPS                 : 0
        Average Latency             : 0
    FBC Stats
        Active Sessions             : 0
        Average FPS                 : 0
        Average Latency             : 0
    Ecc Mode
        Current                     : Enabled
        Pending                     : Enabled
    ECC Errors
        Volatile
            Single Bit            
                Device Memory       : 0
                Register File       : 0
                L1 Cache            : 0
                L2 Cache            : 0
                Texture Memory      : N/A
                Texture Shared      : N/A
                CBU                 : N/A
                Total               : 0
            Double Bit            
                Device Memory       : 0
                Register File       : 0
                L1 Cache            : 0
                L2 Cache            : 0
                Texture Memory      : N/A
                Texture Shared      : N/A
                CBU                 : 0
                Total               : 0
        Aggregate
            Single Bit            
                Device Memory       : 0
                Register File       : 0
                L1 Cache            : 0
                L2 Cache            : 0
                Texture Memory      : N/A
                Texture Shared      : N/A
                CBU                 : N/A
                Total               : 0
            Double Bit            
                Device Memory       : 0
                Register File       : 0
                L1 Cache            : 0
                L2 Cache            : 0
                Texture Memory      : N/A
                Texture Shared      : N/A
                CBU                 : 0
                Total               : 0
    Retired Pages
        Single Bit ECC              : 0
        Double Bit ECC              : 0
        Pending                     : No
    Temperature
        GPU Current Temp            : 29 C
        GPU Shutdown Temp           : 90 C
        GPU Slowdown Temp           : 87 C
        GPU Max Operating Temp      : 83 C
        Memory Current Temp         : 26 C
        Memory Max Operating Temp   : 85 C
    Power Readings
        Power Management            : Supported
        Power Draw                  : 34.09 W
        Power Limit                 : 250.00 W
        Default Power Limit         : 250.00 W
        Enforced Power Limit        : 250.00 W
        Min Power Limit             : 100.00 W
        Max Power Limit             : 250.00 W
    Clocks
        Graphics                    : 1230 MHz
        SM                          : 1230 MHz
        Memory                      : 877 MHz
        Video                       : 1110 MHz
    Applications Clocks
        Graphics                    : 1230 MHz
        Memory                      : 877 MHz
    Default Applications Clocks
        Graphics                    : 1230 MHz
        Memory                      : 877 MHz
    Max Clocks
        Graphics                    : 1380 MHz
        SM                          : 1380 MHz
        Memory                      : 877 MHz
        Video                       : 1237 MHz
    Max Customer Boost Clocks
        Graphics                    : 1380 MHz
    Clock Policy
        Auto Boost                  : N/A
        Auto Boost Default          : N/A
    Processes                       : None

You are using a device in process exclusive mode, so it’s likely you have another process running that uses the GPU in compute mode.

Thanks! I think I had a different issue with cudnn lib availability when I first using CuArrays in the session and then when I called test CuArrays the device was already used.