I need to perform a multiplication between two fairly large sparse matrices C = A * B and I would like to try using the GPU for that. The problem arises in a context where A represents a fixed 2D convolution kernel, while B represents a list of 2D images (each of which is a column of B). Therefore, if necessary, I can easily change the storage scheme of A without a significant penalty.
I can easily do this operation with CUDA, but I could not find any way to perform sparse matrix products with Metal. Assuming there is no official implementation available, does anyone know a generic implementation of generic sparse matrix multiplication SpGEMM for GPUs? Ideally, one could perhaps write a suitable kernel using KernelAbstractions.
Why don’t you try to perform the convolutions in the Fourier domain?
You precompute the FFT of the padded kernel once and then in a loop (or stream) you multiply it element-wise with the FFT of each of the images and then take the inverse FFT of each result.
It should be faster for 5x5 and larger kernels even for a single image..
Thank you @pitsianis. The problem with this approach is that I will need to perform a matrix-matrix multiplication, not a matrix-vector multiplication. The latter could be performed using 2D FFT, but the former requires either 3D FFT or a repeated 2D FFT, and I am not sure I will gain much given the sparsity of the matrix (but I will give a try). Additionally, I am not sure Metal.jl has FFT (probably not?).