Add specific elements of a CUDA matrix

NonDairyNeutrino · March 20, 2024, 11:13pm

Question

I have an N \times N matrix A on the GPU (e.g. CuArray{Float32, 2, CUDA.Mem.DeviceBuffer} from CUDA.jl), and two N-element “index vectors” iVector and jVector also on the GPU. I want to add the elements of A located at indices \{(i_1, j_1), (i_2, j_2), \ldots, (i_N, j_N)\}. With everything on the CPU, this is of course trivial with something like

sum(A[iVector[k], jVector[k]] for k in 1:N)

There’s a similar method with everything on the GPU with

sum(@inbounds view(A, view(iVector, index), view(jVector, index)) for index in eachindex(iVector))

but my gut says this isn’t the best way to do it and is more just a workaround. Is there a better or more correct way to do this? What about doing the summation in parallel on the GPU as well? Possibly writing a custom CUDA kernal?

Example

using Random: randperm
using CUDA
using BenchmarkTools

const dimension = 10^4
const adjacencyMatrix = CUDA.rand(dimension, dimension)
const tour = convert.(Int32, randperm(dimension)) |> cu

function fitness(
    adjacencyMatrix :: Union{Matrix{T}, CuArray}, 
    position :: Union{Vector, CuArray}
) where T <: Real
    head = view(position, 1:length(position)-1)
    tail = view(position, 2:length(position))
    return sum(
        @inbounds view(
            adjacencyMatrix, 
            view(head, index), 
            view(tail, index)
         ) for index in eachindex(head)
    )
end

display(@btime fitness(adjacencyMatrix, tour);)

Output:

119.928 ms (240377 allocations: 8.36 MiB)

maleadt · March 21, 2024, 2:48pm

You can take the sum of a view with indices:

function fitness(adjacencyMatrix::Union{Matrix{T}, CuArray},
                 position::Union{Vector, CuArray}) where T <: Real
    head = view(position, 1:length(position)-1)
    tail = view(position, 2:length(position))
    idx = CartesianIndex.(head, tail)
    sum(view(adjacencyMatrix, idx))
end

That’s slightly faster 100us instead of 270ms on my system.

Topic		Replies	Views
Increment elements of array by index GPU tullio	10	1830	November 11, 2020
How to "reduce" an array? General Usage cuda , arrays	19	1269	January 14, 2023
Correct implementation of CuArray's slicing operations GPU	3	587	October 31, 2023
Scalar Indexing error when performing mul!(A, B, C) with A a view of a Matrix GPU gpu , cuda , views	3	154	June 11, 2024
Using @view with CuArrays GPU	6	1155	September 20, 2023

Add specific elements of a CUDA matrix

Question

Example

Related topics