CUDA_ERROR_ILLEGAL_ADDRESS with CuArrays/Zygote getindex

tanhevg · December 27, 2019, 7:00pm

Hello, I am getting an error CuError(CUDA_ERROR_ILLEGAL_ADDRESS) with the following stacktrace. My code is trying to take gradients of a deep learning model with Zygote. I wonder if this is a symptom of running out of memory, or a bug in CuArrays/Zygote?

The line where it fails in Zygote corresponds to ∇getindex. My code is indexing into a CuArray using an array of integers in the main memory, like this:

x[idc, :] # typeof(x) == CuArray{Float32, 2}; typeof(idc) == Vector{Int}

ERROR: LoadError: CUDA error: an illegal memory access was encountered (code 700, ERROR_ILLEGAL_ADDRESS)
Stacktrace:
 [1] cuMemcpyHtoD_v2(::CUDAdrv.CuPtr{Int64}, ::Ptr{Int64}, ::Int64) at /rds/general/user/et517/home/.julia/packages/CUDAdrv/3EzC1/src/error.jl:123
 [2] #unsafe_copyto!#6 at /rds/general/user/et517/home/.julia/packages/CUDAdrv/3EzC1/src/memory.jl:285 [inlined]
 [3] unsafe_copyto! at /rds/general/user/et517/home/.julia/packages/CUDAdrv/3EzC1/src/memory.jl:278 [inlined]
 [4] copyto! at /rds/general/user/et517/home/.julia/packages/CuArrays/ZYCpV/src/array.jl:254 [inlined]
 [5] copyto!(::CuArrays.CuArray{Int64,1,Nothing}, ::Array{Int64,1}) at /rds/general/user/et517/home/.julia/packages/GPUArrays/1wgPO/src/abstractarray.jl:118
 [6] convert(::Type{CuArrays.CuArray}, ::Array{Int64,1}) at /rds/general/user/et517/home/.julia/packages/GPUArrays/1wgPO/src/construction.jl:84
 [7] _adapt_structure at /rds/general/user/et517/home/.julia/packages/CuArrays/ZYCpV/src/array.jl:237 [inlined]
 [8] adapt_structure at /rds/general/user/et517/home/.julia/packages/Adapt/aeQPS/src/base.jl:12 [inlined]
 [9] adapt at /rds/general/user/et517/home/.julia/packages/Adapt/aeQPS/src/Adapt.jl:6 [inlined]
 [10] adapt_structure(::CUDAnative.Adaptor, ::SubArray{Float32,2,CuArrays.CuArray{Float32,2,Nothing},Tuple{Array{Int64,1},Base.Slice{Base.OneTo{Int64}}},false}) at /rds/general/user/et517/home/.julia/packages/CuArrays/ZYCpV/src/subarray.jl:63
 [11] adapt at /rds/general/user/et517/home/.julia/packages/Adapt/aeQPS/src/Adapt.jl:6 [inlined]
 [12] cudaconvert at /rds/general/user/et517/home/.julia/packages/CUDAnative/RhbZ0/src/execution.jl:211 [inlined]
 [13] map at ./tuple.jl:141 [inlined]
 [14] macro expansion at /rds/general/user/et517/home/.julia/packages/CUDAnative/RhbZ0/src/execution.jl:174 [inlined]
 [15] macro expansion at ./gcutils.jl:87 [inlined]
 [16] macro expansion at /rds/general/user/et517/home/.julia/packages/CUDAnative/RhbZ0/src/execution.jl:173 [inlined]
 [17] _gpu_call(::CuArrays.CuArrayBackend, ::Function, ::SubArray{Float32,2,CuArrays.CuArray{Float32,2,Nothing},Tuple{Array{Int64,1},Base.Slice{Base.OneTo{Int64}}},false}, ::Tuple{SubArray{Float32,2,CuArrays.CuArray{Float32,2,Nothing},Tuple{Array{Int64,1},Base.Slice{Base.OneTo{Int64}}},false},Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64},Base.OneTo{Int64}},typeof(+),Tuple{Base.Broadcast.Extruded{SubArray{Float32,2,CuArrays.CuArray{Float32,2,Nothing},Tuple{Array{Int64,1},Base.Slice{Base.OneTo{Int64}}},false},Tuple{Bool,Bool},Tuple{Int64,Int64}},Base.Broadcast.Extruded{CuArrays.CuArray{Float32,2,Nothing},Tuple{Bool,Bool},Tuple{Int64,Int64}}}}}, ::Tuple{Tuple{Int64},Tuple{Int64}}) at /rds/general/user/et517/home/.julia/packages/CuArrays/ZYCpV/src/gpuarray_interface.jl:62
 [18] gpu_call at /rds/general/user/et517/home/.julia/packages/GPUArrays/1wgPO/src/abstract_gpu_interface.jl:151 [inlined]
 [19] gpu_call at /rds/general/user/et517/home/.julia/packages/GPUArrays/1wgPO/src/abstract_gpu_interface.jl:128 [inlined]
 [20] copyto! at /rds/general/user/et517/home/.julia/packages/GPUArrays/1wgPO/src/broadcast.jl:48 [inlined]
 [21] copyto! at ./broadcast.jl:842 [inlined]
 [22] materialize! at ./broadcast.jl:801 [inlined]
 [23] (::getfield(Zygote, Symbol("##984#986")){CuArrays.CuArray{Float32,2,CuArrays.CuArray{Float32,4,Nothing}},Tuple{Array{Int64,1},Colon}})(::CuArrays.CuArray{Float32,2,Nothing}) at /rds/general/user/et517/home/.julia/packages/Zygote/N2BNN/src/lib/array.jl:38
...

maleadt · December 28, 2019, 1:12pm

Don’t you see a kernel exception some time earlier? If you do, you can run with julia -g2 to see more details. If you don’t, try running with --check-bounds=yes.

tanhevg · January 2, 2020, 6:20pm

I get lots of error messages like this:

error in running finalizer: CUDAdrv.CuError(code=CUDAdrv.cudaError_enum(0x000002bc), meta=nothing)

I have tried running julia -g2 --check-bounds=yes but got the same error.

This must be somehow related to it running out of memory, because when I reduce the minibatch size it goes away.

maleadt · January 3, 2020, 6:48am

CUDA errors persist, so the ones you are seeing in the finalizer are the same you caught earlier one time (as reported in your first post), i.e., CUDA_ERROR_ILLEGAL_ADDRESS.

Are you switching devices, perhaps?

tanhevg · January 3, 2020, 12:07pm

I am not switching devices, although the server where I run this and get the error has multiple GPUs.

When I run this on a less powerful machine with a single GPU and less memory, I get a nice “Out of GPU memory” error.

Topic		Replies	Views
Illegal memory access problem CUDA GPU	8	2624	November 24, 2021
Help wanted: CUDA error: an illegal memory access was encountered GPU	1	4507	January 11, 2019
Zygote errors on simple operations with Complex CUDA Arrays GPU cuda , zygote	0	391	May 4, 2021
multiple-GPUs per process GPU	3	342	April 27, 2023
Zygote + CUDA: scalar getindex with custom activation function using multiplication GPU flux	3	720	September 3, 2020

CUDA_ERROR_ILLEGAL_ADDRESS with CuArrays/Zygote getindex

Related topics