Sparse matrix multiplication for Metal

Marco_Lombardi · July 27, 2025, 10:06am

I need to perform a multiplication between two fairly large sparse matrices C = A * B and I would like to try using the GPU for that. The problem arises in a context where A represents a fixed 2D convolution kernel, while B represents a list of 2D images (each of which is a column of B). Therefore, if necessary, I can easily change the storage scheme of A without a significant penalty.

I can easily do this operation with CUDA, but I could not find any way to perform sparse matrix products with Metal. Assuming there is no official implementation available, does anyone know a generic implementation of generic sparse matrix multiplication SpGEMM for GPUs? Ideally, one could perhaps write a suitable kernel using KernelAbstractions.

LaurentPlagne · July 27, 2025, 1:15pm

Did you try to use AppleAccelerate ?

Marco_Lombardi · July 27, 2025, 9:37pm

@LaurentPlagne as far as I know AppleAccelerate is not using the GPU.

raman_kumar · July 27, 2025, 11:04pm

For sparse arrays you can try Sparse Arrays · The Julia Language.

pitsianis · July 28, 2025, 12:10am

Why don’t you try to perform the convolutions in the Fourier domain?
You precompute the FFT of the padded kernel once and then in a loop (or stream) you multiply it element-wise with the FFT of each of the images and then take the inverse FFT of each result.

It should be faster for 5x5 and larger kernels even for a single image..

Marco_Lombardi · July 28, 2025, 9:49am

My question was associated to Metal, so to the Apple GPU library. In other words, I want to use the GPU to speed up the computation.

Marco_Lombardi · July 28, 2025, 9:56am

Thank you @pitsianis. The problem with this approach is that I will need to perform a matrix-matrix multiplication, not a matrix-vector multiplication. The latter could be performed using 2D FFT, but the former requires either 3D FFT or a repeated 2D FFT, and I am not sure I will gain much given the sparsity of the matrix (but I will give a try). Additionally, I am not sure Metal.jl has FFT (probably not?).

simsurace · July 28, 2025, 2:31pm

Maybe take a look at this issue on Metal.jl.

gdalle · July 28, 2025, 3:20pm

There are some hardware-agnostic sparse matrix types and kernels by @AntoineBut in GitHub - AntoineBut/GPUGraphs.jl: Workspace for my semester project in HPNALGS @ EPFL

Marco_Lombardi · July 29, 2025, 3:41pm

Just as a record, I tried FFT, but on the CPU it is significantly slower (by a factor ~30) than the sparse matrix approach. Also, it requires a lot of data transfer, so I do not think it is sensible to try this path on the GPU.

Marco_Lombardi · July 29, 2025, 3:45pm

Thank you @gdalle. I had a look and could only find matrix-vector kernels, but perhaps I am missing something.

rveltz · July 29, 2025, 3:48pm

maybe mlx has it. Id say MLX.jl does not have it yet though

gdalle · July 29, 2025, 4:16pm

Oh right, I had missed the matmat aspect. Perhaps these basic matvec kernels can be a starting point though

pitsianis · July 29, 2025, 5:22pm

I am surprised by your comment. We are using the Fourier domain approach in FastLocalCorrelationCoefficients.jl

Also, regarding the data transfers, you can work with sub images (at an added computational cost of redoing the sub-image border).

If you do not mind, please share the code of your experiment, at least one of us is going to learn something new

Marco_Lombardi · July 31, 2025, 1:29pm

Thank you. I checked MLX and it has matrix multiplication. However, I did not find any indication that it has a sparse matrix type/ do you know how to use sparse matrices with MLX?

rveltz · July 31, 2025, 5:59pm

nope. I am not on my computer but mlx is where I would look if metal.jl does not have what I want

Topic		Replies	Views
Support for Sparse Matrices on GPU (CUSPARSE) GPU gpu , knet , flux	1	1121	September 22, 2018
Use GPU to generate known sparse matrix GPU	2	2023	January 6, 2020
CUSPARSE matrix-matrix multiplication not using GPU GPU	2	884	January 2, 2023
Parallel sparse matrix vector product Internals & Design	4	1406	June 29, 2018
Efficient multiplication of out-of-memory sparse matrix and an in-memory dense vector General Usage	7	828	August 19, 2019

Sparse matrix multiplication for Metal

Related topics