# Efficient multiplication of out-of-memory sparse matrix and an in-memory dense vector

Say I have a really huge sparse matrix A, and an equally huge dense vector v. I need to do `A * v`. The structure of `A` is simple, like a -1 -1 4 -1 -1 stencil in 2D for example (in practice it’s a bit more complicated) so that I want to compute `A * v` without storing `A` itself in memory (which would be a real waste), only v. The effect of `A` on a given part of `v` would be computed on the fly. I also need to use all my CPU cores as efficiently as possible, make use of SIMD, keep cache locality and minimize memory fetches. Is there a julia package that can help for this kind of problem?

If I understand your problem correctly, you might be better off using `imfilter` from the JuliaImages stack, which will avoid allocating a matrix `A`.

what about a Matrix Free like function if you know how to evaluate it?

what about a Matrix Free like function if you know how to evaluate it?

Ah, you mean something like `LinearMaps.jl`, yes? It does seem like the perfect solution. However, for the parallelization I guess I would need to take care of that myself when defining the map function. I was hoping to find a specific solution that has been optimised for me already, but I guess that is too much to ask, as the kind of optimization tricks probably depend on `A`.

If I understand your problem correctly, you might be better off using `imfilter` from the JuliaImages stack, which will avoid allocating a matrix `A` .

This might be closer to what I was looking for, yes! I’ll look into it.

Thanks to both

Well, you have `DiffEqOperators` for spencils if you dont want to bother writing the forloops on each piece of your distributed vector. And you can use DistributedArrays for managing your vector.

Ah, yes! Very nice. Unfortunately my “stencil” is an arbritrary comlicated “hopping” matrix on the lattice discrete. I’m not sure one can add completely arbitrary stencil coefficients with `DiffEqOperators`. I’ll look into it.

DiffEqOperators handles this case now. It composes operators of different dimensions and does a cache-optimized convolution call. We’ll get the docs updated.

1 Like

Excellent! Looking forward to the docs, thanks!