I was trying to improve the runtime of a model I am running in DiffEq.jl
I have been going through the optimization steps for an eqnsystem!(du, u, p, t) however I am a bit stuck with regards to how to optimize sparse matrix multiplication.

In essence I need to evaluate a flux operator at every time step and with it perform
(Dx * (D * Dx)) * u
Here my Dx would be a sparse differentiation matrix, D is a diagonal matrix (that also updates at every timestep), and u is the solution vector.
I want to preallocate a term Dx * (D * Dx) = P, where P will have a fixed sparsity pattern in the hope of reducing allocations, as it causes my code to bog down substantially.

I’ve tried things such as mul!(P, Dx, Dx), but this method is extremely slow.

Currently the best performing solution is just evaluating it in-place with P .= Dx * D * Dx however this still creates a lot of allocations which increases substantially with the number of points. Any advice would help!

Depending on the size, you can modelingtoolkitize to trace out the sparse structure down to scalarized flat equations and remove all of the sparsity overhead, and that can sometimes speed things up.

I had a moment of clarity and realized I didn’t have to preallocate the operator like that at all and instead could just multiply u by each operator separately (very silly of me to forget basic linear algebra). Doing this with mul!(flux_x, Dx, u) and just applying each operator one by one allows me to the matrix operations in place. I got around a 5-10x speed up and reduced the allocations from 100 MiB to around 1 MiB, when solving for a system with 25k unknowns.

Just to answer your questions though, I am preallocating everything by using a closure. So for example eqnclosure!(du, u, p, t, Dx, flux_x, ...) -> eqnsystem!(du, u, p, t) after having preallocated Dx, flux_x, etc. Also yes everything is type stable.

Thanks for the suggestion though. I’ll keep the modelingtoolkitize method for tracing sparse matrices in my back pocket for a rainy day.