Illegal memory access problem CUDA

I am creating some dynamic shared memory boolean arrays in kernel, and it give me consistently

ERROR: LoadError: CUDA error: an illegal memory access was encountered (code 700, ERROR_ILLEGAL_ADDRESS)
Stacktrace:
 [1] throw_api_error(res::CUDA.cudaError_enum)
   @ CUDA C:\Users\1\.julia\packages\CUDA\9T5Sq\lib\cudadrv\error.jl:105
 [2] query
   @ C:\Users\1\.julia\packages\CUDA\9T5Sq\lib\cudadrv\stream.jl:102 [inlined]
 [3] synchronize(stream::CuStream; blocking::Bool)
   @ CUDA C:\Users\1\.julia\packages\CUDA\9T5Sq\lib\cudadrv\stream.jl:130
 [4] synchronize (repeats 2 times)
   @ C:\Users\1\.julia\packages\CUDA\9T5Sq\lib\cudadrv\stream.jl:117 [inlined]
 [5] unsafe_copyto!(dest::Vector{UInt16}, doffs::Int64, src::CuArray{UInt16, 1, CUDA.Mem.DeviceBuffer}, soffs::Int64, n::Int64)
   @ CUDA C:\Users\1\.julia\packages\CUDA\9T5Sq\src\array.jl:389
 [6] copyto!
   @ C:\Users\1\.julia\packages\CUDA\9T5Sq\src\array.jl:349 [inlined]
 [7] getindex(xs::CuArray{UInt16, 1, CUDA.Mem.DeviceBuffer}, I::Int64)
   @ GPUArrays C:\Users\1\.julia\packages\GPUArrays\3sW6s\src\host\indexing.jl:89
 [8] top-level scope
   @ c:\GitHub\GitHub\NuclearMedEval\src\playgrounds\convolutionsPlay.jl:71

I am wondering what is wrong here - my assumption is that true size of boolean array in bytes is amount of its entries divided by 8 so 1 bit per entry - am I correct?


using CUDA
dataBdim= (32,24,32)
fp = CUDA.zeros(UInt16,1)
sumInBits = (dataBdim[1]+2)+(dataBdim[2]+2)+(dataBdim[3]+2)+dataBdim[1]+dataBdim[2]+dataBdim[3]
shmemSum = cld(sumInBits,8)#in bytes
function testKernelA(dataBdim,fp)
    resShmem =  @cuDynamicSharedMem(Bool,((dataBdim[1]+2),(dataBdim[2]+2),(dataBdim[3]+2))) 
       sourceShmem =  @cuDynamicSharedMem(Bool,(dataBdim[1],dataBdim[2],dataBdim[3]))
    # naive loop just for presentation of problem
    for i in 1:(dataBdim[1]+2),j in 1:(dataBdim[2]+2), n in 1:(dataBdim[3]+2)
        resShmem[i,j,n]=false
    end
 
    for i in 1:(dataBdim[1]),j in 1:(dataBdim[2]), n in 1:(dataBdim[3])
        sourceShmem[i,j,n]=false
    end
    fp[1]=1
return
end
@cuda threads=(32,5) blocks=(2) shmem=shmemSum  testKernelA(dataBdim,fp)
fp[1]

How is this ‘in bits’ if you’re nowhere multiplying by sizeof(UInt16) (or 8*sizeof if you actually want this size to be bits)?

You are right still changing it to

sumInBits = (dataBdim[1]+2)*(dataBdim[2]+2)*(dataBdim[3]+2)+dataBdim[1]*dataBdim[2]*dataBdim[3]

do not solve the problem, but is size of needed if this is boolean array? is it not just bitarray?

There’s still no sizeof in that expression? And it is needed, CuArray{Bool} doesn’t have the same bitarray-like optimization implemented.

Also, try out CUDA.jl#master, there the dynamic memory accesses are bounds checked so will throw a BoundsError instead of crashing CUDA with an illegal memory access.

Ok, so can I use bit type in shared memory ?

And what do you mean by master, I suppose you deduced that I am using some branch, what is not intended by me , I had found somewhere that shared memory initialization macro should now be a function - this is what you mean ?

Thanks !

The master branch on GitHub, i.e. ] add CUDA#master

You’re probably on the latest stable release (if you didn’t do anything fancy).

1 Like

Ok , thanks

So I already understand it i suppose :slightly_smiling_face:, still is there a way to use 3 dimensional bit array in shared memory ? It would be extremely usefull .

No, the BitArray optimization has not been implemented for CuArray. Just use a regular Bool array. If space is a problem, you’ll need to look into implementing BitArray’s packed layout.

1 Like