How do I reduce into an array index on a GPU?

ejmeitz · July 10, 2023, 7:20pm

Hello all,

This might be too niche but I am doing the following sparse tenor times matrix multiplication:
A(n,m,l) = B(i,j,k) C(i,n) C(j,m)C(l,k)

I can calculate a single element of the matrix A: A[n,m,l] where B is a stored as a COO vector.

function gpu_k3_kernel(B, C_n, C_m, C_l)
    f = (A_data) -> B_data.val * C_n[B_data.i] * C_m[B_data.j] * C_l[B_data.k]
   return mapreduce(f, +, A)
end

but this has a horrible memory access pattern since the indices i,j,k randomly acesss C and the kernel runs into memory bandwidth issues quickly. Not to mention I have to re-launch this kernel for every element.

I can re-write this kernel to do the entire multiplication at once with the code below and so that C is accessed in a more organized manner. The only problem is now I am accumulating into the array A which I am not really sure how to handle in an easy way (I’m not expert GPU programmer). Is there some way to do this without having to re-write a reduce kernel?? Any help would be greatly appreciated. Thanks!

for b in eachindex(B)
        for o in 1:N_modes
            for n in 1:o
                for m in 1:n
                    A[m, n, o] += B[b].val * C[B[b].i, m] * C[B[b].j, n] * C[B[b].k, o]
                end
            end
        end
    end

Topic		Replies	Views
GPU Map without reduction on multiple arrays indices GPU	1	704	February 8, 2019
Is it possible to index a CuArray with a CuArray? GPU question	1	850	January 11, 2019
How to "reduce" an array? General Usage cuda , arrays	19	1309	January 14, 2023
GPU - CLArrays, GPUArrays - Extract a Matrix from an Array GPU	2	1244	May 1, 2019
Saving values from GPU during Euler stepping in ODE Performance arrayfire , gpu , gpuarrays	23	2346	June 27, 2018

How do I reduce into an array index on a GPU?

Related topics