I am simulating a “mass” that obeys a law of motion. The algorithm I use to discretize to a finite grid has the property that each cell is distributed to `K`

others after being multiplied by something else. Formally, each x z_k is added to some y_{p_k} according to “patterns” `p`

(ie collections of indices).

There is an MWE below using `SVector`

, but that’s just for the purposes of an example, generally my type supports the relevant methods for `*`

, `zero`

, `+`

. `K`

is known statically.

This is not unlike a generalization of a sparse matrix multiplication, but with a very special index structure. I am looking for suggestions to improve it; specifically, is there a simple way to calculate a permutation of indexes in `xs`

/`ys`

so that the memory access for `ys`

is improved?

```
using StatsBase, StaticArrays, BenchmarkTools
N = 100
K = 4
M = 3
patterns = [SVector{K}(sample(1:N, 4; replace = false)...) for _ in 1:N]
xs = rand(SVector{M,Float64}, N)
ys = similar(xs)
zs = rand(SVector{K,Float64})
function combine_forward!(ys, xs::Vector{T},
patterns::Vector{SVector{K,Int}},
zs::SVector{K}) where {T,K}
N = length(xs)
fill!(ys, zero(T))
@inbounds for (x, pattern) in zip(xs, patterns)
for (j, z) in zip(pattern, zs)
ys[j] += x * z
end
end
ys
end
@btime combine_forward!(ys, xs, patterns, zs);
```