CUDA.jl - Atomic Addition Error To A Sparse Array Inside CUDA Kernel

Hello all,

I am trying to assemble many individual elements of a huge system to a global system. The global system is sparse, It is a certain thing. Also it is wasteful to store all zero elements in memory, I just need the nonzero elements. So I wanted to use a sparse array for my program.

If I use a classical “zeros” array, I don’t get an error. However if I change the type to sparse array, I get an error. Example code:

clearconsole()
# Import related libraries
using CUDA
using StaticArrays

# Assembly kernel
function gpu_arbitrary_assemblage(global_mat, indices_coor, indices_contribution)
    tidx = (blockIdx().x - 1) * blockDim().x + threadIdx().x #calculate thread ID number
    if tidx <= 10
        indices = indices_coor[:, tidx]
        CUDA.@atomic global_mat[indices[1], indices[2]] += indices_contribution[tidx] # perform atomic addition to many array locations
    end
    return nothing
end

checking = "not works 1"
# the global array definition. It all starts with zeros, or lets say empty values.
if checking == "works"
    global_mat = CUDA.fill(0.0f0, (6, 6))
elseif checking == "not works 1"
    global_mat = CUSPARSE.CuSparseMatrixCSR(CUDA.fill(0.0f0, (6, 6)))
elseif checking == "not works 2"
    global_mat = CUSPARSE.CuSparseMatrixCSC(CUDA.fill(0.0f0, (6, 6)))
end


# the indices, information for add which value to "where".
# I have no problem with this part, I solved it. There is also a topic which I created about this.
# The topic is "CUDA.jl - Sub-Vector Indexing Problem Inside CUDA Kernel"
indices_coor = SMatrix{2, 10, Int32}([3 3 1 1 4 4 4 2 5 5;
                                      5 5 2 2 4 4 6 3 2 2])

indices_contribution = cu([0.6, 0.6, 0.7, 0.7, 0.8, 0.8, 1, 1, 0.9, 0.9]) #random float numbers to be added on global array

# display the initial global array
display(global_mat)

# check the type of the array
display(typeof(global_mat))

#cuda kernel
@cuda threads = (32, 1, 1) gpu_arbitrary_assemblage(global_mat, indices_coor, indices_contribution);

#show the result
display(global_mat)

You can test this by changing the “checking” variable to “works”, “not works 1” and “not works 2”. The errors I get are:

  1. Reason: unsupported call through a literal pointer (call to jl_alloc_string)
  2. Reason: unsupported call through a literal pointer (call to memset)
  3. Reason: unsupported call through a literal pointer (call to jl_array_grow_end)
  4. Reason: unsupported call through a literal pointer (call to jl_array_del_end)

As you can see, if i change the type of the array, the kernel does not work. Is there any solution for me to use sparse array instead of a normal array? I use Juno editor. Thank you.

No, mutating a sparse array like that is not supported. General setindex! may require reallocation of the sparse array components to make place for that element, which is currently impossible (and generally a very bad thing to do like that on a GPU anyway, since this will wreck performance).

Thank you, I will continue to use normal arrays but having a feature like this would be wonderful.

It’s not going to happen, though, it’s just inherently incompatible with how GPUs work. Only in specific cases, where you know the sparsity structure beforehand (and you can allocate and initialize the sparse array components on the host before the kernel launch) we can support this. For example, with broadcast: https://github.com/JuliaGPU/CUDA.jl/pull/1380

Actually I know where exactly to add values from DegreeOfFreedom and BoundaryCondition arrays but I have to finish my project on time, I cannot try this new feature. I just wanted to use it if existed. I will try to further optimize my program in summer. Thank you for answers.