Improve performance of function that produces and hcats SMatrices

I have two functions, which for our MWE could be the following:

``````using BenchmarkTools, StaticArrays
@inline function f(x, p, n)
@inbounds x1, x2, x3 = x[1], x[2], x[3]
SVector( 3.8*x1*(1-x1) - 0.05*(x2+0.35)*(1-2*x3),
0.1*( (x2+0.35)*(1-2*x3) - 1 )*(1 - 1.9*x1),
3.78*x3*(1-x3)+0.2*x2 )
end
@inline function jac(x, p, n)
@SMatrix [3.8*(1 - 2x[1]) -0.05*(1-2x[3]) 0.1*(x[2] + 0.35);
-0.19((x[2] + 0.35)*(1-2x[3]) - 1)  0.1*(1-2x[3])*(1-1.9x[1])  -0.2*(x[2] + 0.35)*(1-1.9x[1]);
0.0  0.2  3.78(1-2x[3]) ]
end
``````

I donâ€™t think the actual functions matter, as long as the first returns `SVector` and the latter returns `SMatrix`.
Letâ€™s do timings:

``````p = nothing
s = rand(SVector{3}); Q = rand(SMatrix{3,3})
@btime \$f(\$s, \$p, 0)
@btime \$jac(\$s, \$p, 0)
6.158 ns (0 allocations: 0 bytes)
5.337 ns (0 allocations: 0 bytes)
``````

Now, I create a new function, based on the above:

``````        ws_index = SVector{3, Int}(2:4...)
tangentf = (u, p, t) -> begin
du = f(u[:, 1], p, t)
J = jac(u[:, 1], p, t)
dW = J*u[:, ws_index]
return hcat(du, dW)
end
``````

and benchmark to see:

``````S = hcat(s, Q)
@btime \$(tangentf)(\$S, \$p, 0)

109.685 ns (6 allocations: 496 bytes)
3Ă—4 StaticArrays.SArray{Tuple{3,4},Float64,2,12}:
0.567631  -1.8819    -1.73773   -1.12878
0.102564   0.353318   0.263594   0.288192
0.656533  -1.68341   -0.506125  -1.94484
``````

There are allocations done, and the timing is much larger than the timing of applying the 2 individual functions and performing an `hcat` (which has miniscule timing).

So, how can I improve this and most importantly: what am I doing wrong?

Donâ€™t use global variables when benchmarking.

1 Like

Thanks!

``````tangentf = let ws_index = SVector{3, Int}(2:4...)
tangentf = (u, p, t) -> begin
du = f(u[:, 1], p, t)
J = jac(u[:, 1], p, t)
dW = J*u[:, ws_index]
return hcat(du, dW)
end
end
``````
``````
S = hcat(s, Q)
@btime \$(tangentf)(\$S, \$p, 0)

36.128 ns (0 allocations: 0 bytes)
3Ă—4 StaticArrays.SArray{Tuple{3,4},Float64,2,12}:
0.872362   1.0066    0.474349   0.599955
-0.0237808  0.143415  0.0510413  0.0682469
0.742914   0.813412  1.08132    1.91705
``````

Gives super massive improvements. Still it is not close to the individual function evaluationsâ€¦ Why is that?

Using a dedicated `struct` gave even more performance boost:

``````struct TangentOOP{F, JAC, k}
f::F
jacobian::JAC
ws::SVector{k, Int}
end
@inline function (tan::TangentOOP)(u, p, t)
du = tan.f(u[:, 1], p, t)
J = tan.jacobian(u[:, 1], p, t)
dW = J*u[:, tan.ws]
return hcat(du, dW)
end
``````

benchmarking the later with

``````tan = TangentOOP(f, jac, SVector{3, Int}(2:4...))

@btime (\$tan)(\$S, \$p, 0);

22.169 ns (0 allocations: 0 bytes)
``````