I need to perform a multiplication between two fairly large sparse matrices C = A * B and I would like to try using the GPU for that. The problem arises in a context where A represents a fixed 2D convolution kernel, while B represents a list of 2D images (each of which is a column of B). Therefore, if necessary, I can easily change the storage scheme of A without a significant penalty.
I can easily do this operation with CUDA, but I could not find any way to perform sparse matrix products with Metal. Assuming there is no official implementation available, does anyone know a generic implementation of generic sparse matrix multiplication SpGEMM for GPUs? Ideally, one could perhaps write a suitable kernel using KernelAbstractions.
Why don’t you try to perform the convolutions in the Fourier domain?
You precompute the FFT of the padded kernel once and then in a loop (or stream) you multiply it element-wise with the FFT of each of the images and then take the inverse FFT of each result.
It should be faster for 5x5 and larger kernels even for a single image..
Thank you @pitsianis. The problem with this approach is that I will need to perform a matrix-matrix multiplication, not a matrix-vector multiplication. The latter could be performed using 2D FFT, but the former requires either 3D FFT or a repeated 2D FFT, and I am not sure I will gain much given the sparsity of the matrix (but I will give a try). Additionally, I am not sure Metal.jl has FFT (probably not?).
Just as a record, I tried FFT, but on the CPU it is significantly slower (by a factor ~30) than the sparse matrix approach. Also, it requires a lot of data transfer, so I do not think it is sensible to try this path on the GPU.
Thank you. I checked MLX and it has matrix multiplication. However, I did not find any indication that it has a sparse matrix type/ do you know how to use sparse matrices with MLX?