I am working on porting (parts of) a hydrological model to GPU, to improve performance. This is largely going well because of the KernelAbstractions and AcceleratedKernels packages, which make it quite straightforward to write code that can run on CPU and GPU.
However, there is part of the model which I am unsure of how to port to GPU properly.
The model performs computations based on a graph network, where they store a mapping of source and destination nodes/edges as a ragged array.
A simplified version of the scheme I am trying to solve would be;
input = Vector{Float64}(1:6)
output = zeros(3)
io_map = Vector([
Vector([1]),
Vector([2, 3]),
Vector([4, 5, 6]),
])
for i in eachindex(output)
output[i] = sum(input[io_map[i]])
end
This structure is very similar to TensorFlow’s ragged tensors. Which do seem to be supported by the hardware drivers (ROCm) and CUDA at least).
Ideally I would use these ragged tensors, as it would mean a similar structure of the code as it already has, but I don’t see any support or documentation on this for the AMDGPU.jl, CUDA.jl or any other packages. Has anyone else encountered and/or solved this already?
The alternative would be to use matrix multiplication;
...
io_mat = Array{Int8}([
1 0 0
0 1 0
0 1 0
0 0 1
0 0 1
0 0 1
])
output = sum(io_mat .* input; dims = 1)
But the downside of this is that 1. I need overhaul the code and logic a lot, and 2. I need to get sparse matrices to work for my GPU (AMD) which ROCm doesn’t seem to do out-of-the-box.