@cuDynamicSharedMem : allocating beforehand?


I am getting: ERROR: LoadError: CUDA error: an illegal memory access was encountered (code #700, ERROR_ILLEGAL_ADDRESS) on the following

using CUDAdrv, CUDAnative

function kernel(x)
    i = threadIdx().x
    shared = @cuDynamicSharedMem(Int64,1)
    if i == 1
        shared[1] = 255
    x[i] = shared[1]
    return nothing

d_x = CuArray{Int64,1}(10)
@cuda (1, 10) kernel(d_x)
x = Array(d_x)

The error probably occurs as soon as I try

shared[1] = 255

In the source code CUDAnative.jl/src/device/intrinsics/memory_shared.jl it mentions:

Dynamic shared memory also needs to be allocated beforehand, when calling the kernel.

Yet, I cannot find an example on how to do this.


Changing to @cuStaticSharedMem fixed all errors.


This is by design: dynamic shared memory in CUDAnative.jl is identical to shared memory in CUDA, ie. you need to specify how many bytes to allocate at the launch site: @cuda (blocks, threads[, shmem[, stream]]) kernel(args). If you use static shared memory you specify the number of elements, and the amount of memory can be deduced.