CUDA.jl - Sub-Vector Indexing Problem Inside CUDA Kernel

Hello all,

I’ve been experimenting with CUDA Julia. Formerly, I used CUDA Python but it was so limited and restrictive. So I migrated to Julia. Thanks for the great effort and flexible framework to everybody working on it.

I am currently implementing a computational mechanics code which I cannot address the problem directly on it. It is too long and complicated. I will try to describe my problem on a sample code.

Assume that one array holds array locations like [1, 2]. Another array holds some values to add that particular array location like 0.6. An we have a global array to get our results.

clearconsole()
# Import related libraries
using CUDA
using StaticArrays

# Assembly kernel
function gpu_arbitrary_assemblage(global_mat, indices_coor, indices_contribution)
    tidx = (blockIdx().x - 1) * blockDim().x + threadIdx().x #calculate thread ID number
    if tidx <= 10
        indices = indices_coor[:, tidx]
        CUDA.@atomic global_mat[indices[1], indices[2]] += indices_contribution[tidx] # perform atomic addition to many array locations
    end
    return nothing
end

checking = "works"
# corresponding array locations. for example add 0.6 to global_mat[3,5]
if checking == "works"
    indices_coor = SMatrix{2, 10, Int32}([3 3 1 1 4 4 4 2 5 5;
                                          5 5 2 2 4 4 6 3 2 2])
elseif checking == "not works"
    indices_coor = cu([3 3 1 1 4 4 4 2 5 5;
                       5 5 2 2 4 4 6 3 2 2])
end

indices_contribution = cu([0.6, 0.6, 0.7, 0.7, 0.8, 0.8, 1, 1, 0.9, 0.9]) #random float numbers to be added on global array
global_mat = CUDA.fill(0.0f0, (6, 6))
display(global_mat)
@cuda threads = (32, 1, 1) gpu_arbitrary_assemblage(global_mat, indices_coor, indices_contribution);
display(global_mat)

If we create the array locations as a static matrix, the kernel works. However if we create the array locations as CUDA array, we get an error. You can try it by changing the “checking” variable to “works” and “not works”. The error is named “Reason: unsupported call through a literal pointer (call to jl_alloc_array_1d)”.

Is this a bug? Do I make something wrong or initiate some variables wrong? This functionality is crucial for me. Static arrays are not mutable and at some point I am going to have to change some variables. The changed array will be coordinates of a system at the end. This is my problem. I accept any type of solution including Setfield.jl or other library to mutate static arrays and get them mutated from kernel. However if i I can just use CuArrays for this problem, my code will look more elegant and easy to manipulate.

TLDR: I cannot extract sub-vectors of a larger array if I use CuArray inside a CUDA kernel. If I make it static, I can extract it. However static arrays do not fit my needs. I have to use CuArrays, mutable structs. The error is named “Reason: unsupported call through a literal pointer (call to jl_alloc_array_1d)”. For example this line is not permitted inside a kernel “sub_vector = larger_vector[:, tidx]”

If anybody can help, I will be so glad. Thank you.

Slicing an array like that creates a new array, and allocations like that are not allowed in device code. StaticArrays has this implemented by returning another StaticArray, which is GPU compatible, but in general it’s better to use a view here or just iterate the indices manually.

Thank you so much. Using “view” did not create any problems. I just use that particular vector for information purposes, not manipulation.