You didn’t execute all code. Do subsequent operations still work, and show a successful context construction? This really doesn’t make sense, are you sure there’s no other application using the GPU in the background (try shell> nvidia-smi
from within the Julia session)?
@maleadt: I guess with “all code” you mean e.g. what I exectued in post 4 above. Here you see the result of its execution:
>julia
shell> grep -3 "cuDriverGetVersion" .julia/packages/CUDAnative/AGfq2/src/init.jl
#
# feel free to open a PR adding additional API calls, if you have a specific use for them.
const preinit_apicalls = Set{Symbol}([
:cuDriverGetVersion,
# device calls, commonly used to determine the most appropriate device
:cuDeviceGet,
:cuDeviceGetAttribute,
shell> grep -3 "Creating a CuContext" .julia/packages/CUDAdrv/LC5XS/src/context.jl
error("CuContext cannot be copied")
function CuContext(dev::CuDevice, flags::CUctx_flags=SCHED_AUTO)
@info "Creating a CuContext" dev flags
Base.show_backtrace(stderr, stacktrace())
println(stderr)
handle_ref = Ref{CuContext_t}()
julia> using CuArrays
julia> B = ones(10_000) ./ 2;
julia> A = ones(10_000) .* π;
julia> cuB = CuArray(B);
┌ Info: Creating a CuContext
│ dev = CuDevice(0): Tesla P100-PCIE-16GB
└ flags = SCHED_AUTO::CUctx_flags = 0
Stacktrace:
[1] macro expansion at ./logging.jl:305 [inlined]
[2] CUDAdrv.CuContext(::CUDAdrv.CuDevice, ::CUDAdrv.CUctx_flags) at /users/omlins/.julia/packages/CUDAdrv/LC5XS/src/context.jl:111 (repeats 2 times)
[3] device!(::CUDAdrv.CuDevice) at /users/omlins/.julia/packages/CUDAnative/AGfq2/src/init.jl:69
[4] initialize at /users/omlins/.julia/packages/CUDAnative/AGfq2/src/init.jl:40 [inlined] (repeats 2 times)
[5] macro expansion at ./logging.jl:305 [inlined]
[6] maybe_initialize(::Symbol) at /users/omlins/.julia/packages/CUDAnative/AGfq2/src/init.jl:32
[7] macro expansion at /users/omlins/.julia/packages/CUDAdrv/LC5XS/src/base.jl:140 [inlined]
[8] #alloc#3(::CUDAdrv.Mem.CUmem_attach, ::Function, ::Int64, ::Bool) at /users/omlins/.julia/packages/CUDAdrv/LC5XS/src/memory.jl:161
[9] alloc at /users/omlins/.julia/packages/CUDAdrv/LC5XS/src/memory.jl:157 [inlined] (repeats 2 times)
[10] macro expansion at /users/omlins/.julia/packages/CuArrays/f4Eke/src/memory.jl:237 [inlined]
[11] macro expansion at ./util.jl:213 [inlined]
[12] (::getfield(CuArrays, Symbol("##17#18")){Base.RefValue{CUDAdrv.Mem.Buffer}})() at /users/omlins/.julia/packages/CuArrays/f4Eke/src/memory.jl:236
[13] lock(::getfield(CuArrays, Symbol("##17#18")){Base.RefValue{CUDAdrv.Mem.Buffer}}, ::ReentrantLock) at ./lock.jl:101
[14] macro expansion at ./util.jl:213 [inlined]
[15] alloc(::Int64) at /users/omlins/.julia/packages/CuArrays/f4Eke/src/memory.jl:221
[16] CuArray{Float64,1}(::Tuple{Int64}) at /users/omlins/.julia/packages/CuArrays/f4Eke/src/array.jl:45
[17] convert at /users/omlins/.julia/packages/CuArrays/f4Eke/src/array.jl:99 [inlined]
[18] convert at /users/omlins/.julia/packages/CuArrays/f4Eke/src/array.jl:105 [inlined]
[19] CuArray(::Array{Float64,1}) at /users/omlins/.julia/packages/GPUArrays/hzyWn/src/construction.jl:36
[20] top-level scope at none:0
[21] eval(::Module, ::Any) at ./boot.jl:319
[22] eval_user_input(::Any, ::REPL.REPLBackend) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.0/REPL/src/REPL.jl:85
[23] macro expansion at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.0/REPL/src/REPL.jl:117 [inlined]
[24] (::getfield(REPL, Symbol("##28#29")){REPL.REPLBackend})() at ./task.jl:259
ERROR: CUDA error: invalid device ordinal (code #101, ERROR_INVALID_DEVICE)
Stacktrace:
[1] macro expansion at ./logging.jl:315 [inlined]
[2] CUDAdrv.CuContext(::CUDAdrv.CuDevice, ::CUDAdrv.CUctx_flags) at /users/omlins/.julia/packages/CUDAdrv/LC5XS/src/context.jl:111 (repeats 2 times)
[3] device!(::CUDAdrv.CuDevice) at /users/omlins/.julia/packages/CUDAnative/AGfq2/src/init.jl:69
[4] initialize at /users/omlins/.julia/packages/CUDAnative/AGfq2/src/init.jl:40 [inlined] (repeats 2 times)
[5] macro expansion at ./logging.jl:305 [inlined]
[6] maybe_initialize(::Symbol) at /users/omlins/.julia/packages/CUDAnative/AGfq2/src/init.jl:32
[7] macro expansion at /users/omlins/.julia/packages/CUDAdrv/LC5XS/src/base.jl:140 [inlined]
[8] #alloc#3(::CUDAdrv.Mem.CUmem_attach, ::Function, ::Int64, ::Bool) at /users/omlins/.julia/packages/CUDAdrv/LC5XS/src/memory.jl:161
[9] alloc at /users/omlins/.julia/packages/CUDAdrv/LC5XS/src/memory.jl:157 [inlined] (repeats 2 times)
[10] macro expansion at /users/omlins/.julia/packages/CuArrays/f4Eke/src/memory.jl:237 [inlined]
[11] macro expansion at ./util.jl:213 [inlined]
[12] (::getfield(CuArrays, Symbol("##17#18")){Base.RefValue{CUDAdrv.Mem.Buffer}})() at /users/omlins/.julia/packages/CuArrays/f4Eke/src/memory.jl:236
[13] lock(::getfield(CuArrays, Symbol("##17#18")){Base.RefValue{CUDAdrv.Mem.Buffer}}, ::ReentrantLock) at ./lock.jl:101
[14] macro expansion at ./util.jl:213 [inlined]
[15] alloc(::Int64) at /users/omlins/.julia/packages/CuArrays/f4Eke/src/memory.jl:221
[16] CuArray{Float64,1}(::Tuple{Int64}) at /users/omlins/.julia/packages/CuArrays/f4Eke/src/array.jl:45
[17] convert at /users/omlins/.julia/packages/CuArrays/f4Eke/src/array.jl:99 [inlined]
[18] convert at /users/omlins/.julia/packages/CuArrays/f4Eke/src/array.jl:105 [inlined]
[19] CuArray(::Array{Float64,1}) at /users/omlins/.julia/packages/GPUArrays/hzyWn/src/construction.jl:36
[20] top-level scope at none:0
julia> cuB = CuArray(B);
julia> cuA = CuArray(A);
julia> cuC = 2 .* cuA ./ cuB;
julia> all(cuC .≈ 4*π);
julia> all(cuC .≈ 4*π)
true
julia> typeof(cuC)
CuArray{Float64,1}
shell> nvidia-smi
Mon Jan 7 09:23:57 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.44 Driver Version: 396.44 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla P100-PCIE... On | 00000000:02:00.0 Off | 0 |
| N/A 28C P0 30W / 250W | 299MiB / 16280MiB | 0% E. Process |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 17745 C julia 289MiB |
+-----------------------------------------------------------------------------+
julia>
You can see that there is only Julia running on the GPU. The compute node was allocated by SLURM with exlusive access and the above instructions are the only I executed. The behaviour when calling CuArray
is as before besides the printing of the additional error information.
UPDATE: this problem has been solved meanwhile in the newer versions of CuArrays… thanks @maleadt!
I’m still getting errors that appear related. Any ideas?
(Transformers) pkg> test CuArrays
Testing CuArrays
Resolving package versions...
Project Transformers v0.1.2
Status `/tmp/jl_ZnrWvm/Manifest.toml`
[621f4979] AbstractFFTs v0.5.0
[79e6a3ab] Adapt v1.0.0
[b99e7846] BinaryProvider v0.5.8
[fa961155] CEnum v0.2.0
[3895d2a7] CUDAapi v3.1.0
[c5f51814] CUDAdrv v6.0.0
[be33ccc6] CUDAnative v2.10.2
[bbf7d656] CommonSubexpressions v0.2.0
[3a865a2d] CuArrays v1.7.2
[864edb3b] DataStructures v0.17.9
[163ba53b] DiffResults v1.0.2
[b552c78f] DiffRules v1.0.0
[7a1cc6ca] FFTW v1.2.0
[f5851436] FFTW_jll v3.3.9+3
[1a297f60] FillArrays v0.8.4
[f6369f11] ForwardDiff v0.10.9
[0c68f7d7] GPUArrays v2.0.1
[1d5cc7b8] IntelOpenMP_jll v2018.0.3+0
[929cbde3] LLVM v1.3.3
[856f044c] MKL_jll v2019.0.117+2
[1914dd2f] MacroTools v0.5.3
[872c559c] NNlib v0.6.4
[77ba4419] NaNMath v0.3.3
[efe28fd5] OpenSpecFun_jll v0.5.3+1
[bac558e1] OrderedCollections v1.1.0
[189a3867] Reexport v0.2.0
[ae029012] Requires v0.5.2
[276daf66] SpecialFunctions v0.9.0
[90137ffa] StaticArrays v0.12.1
[a759f4b9] TimerOutputs v0.5.3
[2a0f44e3] Base64 [`@stdlib/Base64`]
[ade2ca70] Dates [`@stdlib/Dates`]
[8ba89e20] Distributed [`@stdlib/Distributed`]
[b77e0a4c] InteractiveUtils [`@stdlib/InteractiveUtils`]
[76f85450] LibGit2 [`@stdlib/LibGit2`]
[8f399da3] Libdl [`@stdlib/Libdl`]
[37e2e46d] LinearAlgebra [`@stdlib/LinearAlgebra`]
[56ddb016] Logging [`@stdlib/Logging`]
[d6f4376e] Markdown [`@stdlib/Markdown`]
[44cfe95a] Pkg [`@stdlib/Pkg`]
[de0858da] Printf [`@stdlib/Printf`]
[3fa0cd96] REPL [`@stdlib/REPL`]
[9a3f8284] Random [`@stdlib/Random`]
[ea8e919c] SHA [`@stdlib/SHA`]
[9e88b42a] Serialization [`@stdlib/Serialization`]
[6462fe0b] Sockets [`@stdlib/Sockets`]
[2f01184e] SparseArrays [`@stdlib/SparseArrays`]
[10745b16] Statistics [`@stdlib/Statistics`]
[8dfed614] Test [`@stdlib/Test`]
[cf7118a7] UUIDs [`@stdlib/UUIDs`]
[4ec0a83e] Unicode [`@stdlib/Unicode`]
[ Info: CuArrays.jl failed to initialize and will be unavailable (set JULIA_CUDA_SILENT or JULIA_CUDA_VERBOSE to silence or expand this message)
ERROR: LoadError: CUDA error: invalid device ordinal (code 101, ERROR_INVALID_DEVICE)
Stacktrace:
[1] throw_api_error(::CUDAdrv.cudaError_enum) at /home/amanela/.julia/packages/CUDAdrv/b1mvw/src/error.jl:131
[2] macro expansion at /home/amanela/.julia/packages/CUDAdrv/b1mvw/src/error.jl:144 [inlined]
[3] cuCtxCreate_v2 at /home/amanela/.julia/packages/CUDAdrv/b1mvw/src/libcuda.jl:108 [inlined]
[4] CuContext(::CuDevice, ::CUDAdrv.CUctx_flags_enum) at /home/amanela/.julia/packages/CUDAdrv/b1mvw/src/context.jl:73
[5] CuContext at /home/amanela/.julia/packages/CUDAdrv/b1mvw/src/context.jl:72 [inlined]
[6] CuContext(::var"#5#7", ::CuDevice) at /home/amanela/.julia/packages/CUDAdrv/b1mvw/src/context.jl:118
[7] iterate at ./none:0 [inlined]
[8] collect(::Base.Generator{CUDAdrv.DeviceSet,var"#4#6"}) at ./array.jl:622
[9] top-level scope at /home/amanela/.julia/packages/CuArrays/HE8G6/test/runtests.jl:27
[10] include at ./boot.jl:328 [inlined]
[11] include_relative(::Module, ::String) at ./loading.jl:1105
[12] include(::Module, ::String) at ./Base.jl:31
[13] include(::String) at ./client.jl:424
[14] top-level scope at none:6
in expression starting at /home/amanela/.julia/packages/CuArrays/HE8G6/test/runtests.jl:27
ERROR: Package CuArrays errored during testing
julia> versioninfo()
Julia Version 1.3.1
Commit 2d5741174c (2019-12-30 21:36 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: Intel(R) Xeon(R) Gold 6226 CPU @ 2.70GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-6.0.1 (ORCJIT, skylake)
shell> nvidia-smi -q
==============NVSMI LOG==============
Timestamp : Thu Feb 13 16:18:56 2020
Driver Version : 418.39
CUDA Version : 10.1
Attached GPUs : 1
GPU 00000000:3D:00.0
Product Name : Tesla V100-PCIE-32GB
Product Brand : Tesla
Display Mode : Enabled
Display Active : Disabled
Persistence Mode : Disabled
Accounting Mode : Disabled
Accounting Mode Buffer Size : 4000
Driver Model
Current : N/A
Pending : N/A
Serial Number : 0561519008380
GPU UUID : GPU-5b191013-a305-1cf6-9ca3-8c02b838eedc
Minor Number : 0
VBIOS Version : 88.00.7E.00.03
MultiGPU Board : No
Board ID : 0x3d00
GPU Part Number : 900-2G500-0010-000
Inforom Version
Image Version : G500.0202.00.02
OEM Object : 1.1
ECC Object : 5.0
Power Management Object : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
GPU Virtualization Mode
Virtualization mode : None
IBMNPU
Relaxed Ordering Mode : N/A
PCI
Bus : 0x3D
Device : 0x00
Domain : 0x0000
Device Id : 0x1DB610DE
Bus Id : 00000000:3D:00.0
Sub System Id : 0x124A10DE
GPU Link Info
PCIe Generation
Max : 3
Current : 3
Link Width
Max : 16x
Current : 16x
Bridge Chip
Type : N/A
Firmware : N/A
Replays Since Reset : 0
Replay Number Rollovers : 0
Tx Throughput : 0 KB/s
Rx Throughput : 0 KB/s
Fan Speed : N/A
Performance State : P0
Clocks Throttle Reasons
Idle : Not Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
HW Thermal Slowdown : Not Active
HW Power Brake Slowdown : Not Active
Sync Boost : Not Active
SW Thermal Slowdown : Not Active
Display Clock Setting : Not Active
FB Memory Usage
Total : 32480 MiB
Used : 0 MiB
Free : 32480 MiB
BAR1 Memory Usage
Total : 32768 MiB
Used : 2 MiB
Free : 32766 MiB
Compute Mode : Exclusive_Process
Utilization
Gpu : 0 %
Memory : 0 %
Encoder : 0 %
Decoder : 0 %
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
FBC Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
Ecc Mode
Current : Enabled
Pending : Enabled
ECC Errors
Volatile
Single Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : 0
Double Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : N/A
Texture Shared : N/A
CBU : 0
Total : 0
Aggregate
Single Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : 0
Double Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : N/A
Texture Shared : N/A
CBU : 0
Total : 0
Retired Pages
Single Bit ECC : 0
Double Bit ECC : 0
Pending : No
Temperature
GPU Current Temp : 29 C
GPU Shutdown Temp : 90 C
GPU Slowdown Temp : 87 C
GPU Max Operating Temp : 83 C
Memory Current Temp : 26 C
Memory Max Operating Temp : 85 C
Power Readings
Power Management : Supported
Power Draw : 34.09 W
Power Limit : 250.00 W
Default Power Limit : 250.00 W
Enforced Power Limit : 250.00 W
Min Power Limit : 100.00 W
Max Power Limit : 250.00 W
Clocks
Graphics : 1230 MHz
SM : 1230 MHz
Memory : 877 MHz
Video : 1110 MHz
Applications Clocks
Graphics : 1230 MHz
Memory : 877 MHz
Default Applications Clocks
Graphics : 1230 MHz
Memory : 877 MHz
Max Clocks
Graphics : 1380 MHz
SM : 1380 MHz
Memory : 877 MHz
Video : 1237 MHz
Max Customer Boost Clocks
Graphics : 1380 MHz
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Processes : None
You are using a device in process exclusive mode, so it’s likely you have another process running that uses the GPU in compute mode.
Thanks! I think I had a different issue with cudnn lib availability when I first using CuArrays in the session and then when I called test CuArrays the device was already used.