(seems) a more efficient way to use diffEqOperator.jl

I found the derivative operators are not efficient, there seems to have a more efficient way to use. If I am wrong please tell me.
For example when I have :

dx = 0.01
x = dx:dx:0.3
Δ1 = UpwindDifference(1, 1, dx, length(x), -0.1)
Δ2 = CenteredDifference(2 ,2, dx, length(x))
bc = RobinBC((0.1, 1E-4, 0.1), (0., 1., 0.), dx, 2)
u0 = @. exp(1-x)
f1(u) = Δ1*bc*u + Δ2*bc*u

the time needed is:

@benchmark f1(u0)
BenchmarkTools.Trial:
  memory estimate:  1.36 KiB
  allocs estimate:  7
  --------------
  minimum time:     658.642 ns (0.00% GC)
  median time:      960.494 ns (0.00% GC)
  mean time:        3.880 μs (73.80% GC)
  maximum time:     1.536 ms (99.91% GC)
  --------------
  samples:          8005
  evals/sample:     162

then if I define:

d1 = Δ1*bc
d2 = Δ2*bc
f2(u) = d1*u+d2*u

the time needed is:

@benchmark f2(u0)
BenchmarkTools.Trial:
  memory estimate:  1.36 KiB
  allocs estimate:  7
  --------------
  minimum time:     698.387 ns (0.00% GC)
  median time:      1.034 μs (0.00% GC)
  mean time:        4.212 μs (74.19% GC)
  maximum time:     2.255 ms (99.94% GC)
  --------------
  samples:          9799
  evals/sample:     124

furthermore, from what i understand, d1 has two elements, d1*u0 is actually d1[1]*u0+d1[2], then define:

A1, b1 = Array(Δ1*bc)
A2, b2 = Array(Δ2*bc)
f3(u) = A1*u+b1 + A2*u+b2

It became slightly faster

@benchmark f3(u0)
BenchmarkTools.Trial:
  memory estimate:  1008 bytes
  allocs estimate:  3
  --------------
  minimum time:     534.043 ns (0.00% GC)
  median time:      959.574 ns (0.00% GC)
  mean time:        5.779 μs (82.36% GC)
  maximum time:     3.336 ms (99.97% GC)
  --------------
  samples:          4767
  evals/sample:     188

actually A1 and A2 are Diagonal matrix, we can further modify:

A12 = Tridiagonal(A1+A2)
b12 = b1+b2
f4(u) = A12*u + b12
@benchmark f4(u0)
BenchmarkTools.Trial: 
  memory estimate:  672 bytes
  allocs estimate:  2
  --------------
  minimum time:     175.385 ns (0.00% GC)
  median time:      371.923 ns (0.00% GC)
  mean time:        3.443 μs (89.20% GC)
  maximum time:     820.765 μs (99.96% GC)
  --------------
  samples:          1944
  evals/sample:     780

the last is much faster than the first.

considering that A1 and A2 may not be tridiagonal matrix (for example when the upwind scheme is 2nd order accurate), define:

A11 = A1+A2
f5(u) = A11*u + b12
@benchmark f5(u0)
BenchmarkTools.Trial:
  memory estimate:  672 bytes
  allocs estimate:  2
  --------------
  minimum time:     281.955 ns (0.00% GC)
  median time:      497.744 ns (0.00% GC)
  mean time:        3.153 μs (84.15% GC)
  maximum time:     1.925 ms (99.96% GC)
  --------------
  samples:          6334
  evals/sample:     266

still faster than the first one

Is it surprising that f4 is faster than f1?
You precompute some of the work operator would do otherwise, so, it’s seems logical that f4 is faster.

(Or is your question is general how to make it faster? If this is the case, then it would be useful to know how you actually want to use the operators.)

I remember once I used previous version of DiffEqOperator.jl for which i got much lower efficiency than manually discretization. Today I thought about this idea so tried some thing…

Can you share a minimal code example?
DiffEqOperator.jl can be used in different ways with the ODE solvers (for example symbolic or not). Which influences what kind of optimisations are necessary.

Thanks for the reply. Unfortunately, it was many months ago with an older version… I don’t know if i can find an example. Or maybe when I have one I can post it on…

Yes, it needs more optimizations. It should use @turbo internally.