# StaticArrays:construct a SMatrix from three vectors

I’m using SMatrix from StaticArrays.jl for calculation of coordinate system transformation. In the hotspot, I’m running following (simplified) code:

``````using StaticArrays, BenchmarkTools
a =  rand(3)
b =  rand(3)
c =  rand(3)
d =  rand(3)
m = hcat((a-d),(b-d),(c-d))
sm = SMatrix{3,3}(hcat((a-d),(b-d),(c-d))...)
m==sm

f1(a,b,c,d) = SMatrix{3,3}(hcat((a-d),(b-d),(c-d))...)
f2(a,b,c,d) = SMatrix{3,3}((a-d)...,(b-d)...,(c-d)...)

@inline function smatrix33(a,b,c)
@inbounds A = SMatrix{3,3}(a[1], a[2], a[3], b[1], b[2], b[3], c[1], c[2], c[3])
A
end
f3(a,b,c,d) = smatrix33((a-d),(b-d),(c-d))
``````

The result medians are for f1, f2 and f3 1.2 us, 780 ns and 174 ns.

Is there a better way to fill an SMatrix by column / row vectors, or is it necessary to write a helper function to get the full speed?

2 Likes
``````julia> A = @SMatrix rand(3, 4)
3×4 StaticArrays.SArray{Tuple{3,4},Float64,2,12}:
0.644328  0.225848  0.36689   0.243095
0.590968  0.462177  0.206045  0.404283
0.973567  0.342697  0.65298   0.901137

julia> f4(A) = A[:, SVector(1, 2, 3)] .- A[:, 4]

julia> @btime f4(\$A)
10.637 ns (0 allocations: 0 bytes)
3×3 StaticArrays.SArray{Tuple{3,3},Float64,2,9}:
0.401233   -0.0172464   0.123795
0.186685    0.057894   -0.198239
0.0724302  -0.55844    -0.248157
``````

I know the arithmetic operations between static arrays are amazingly fast. My concern is, how to effectively construct such a SMatrix out of plain vectors (arrays).

Actually if a,b,c,d are `SVector` already, then just using `hcat` seems the fastest approach. Got 100 ns for

``````a =  @SVector rand(3)
b =  @SVector rand(3)
c =  @SVector rand(3)
d =  @SVector rand(3)
f5(a,b,c,d) = hcat((a-d),(b-d),(c-d))
``````

correctly returning a 3x3 SArray.

I didn’t know there is an optimized method for `hcat`ting `SVectors` but apparently there is. I really do appreciate Matt Bauman’s work on this package.

But with plain arrays, splatting is too slow and the explicit approach of smatrix33 is the winner so far.

``````a = rand(3)
b = rand(3)
c = rand(3)
d = rand(3)

function f6(a, b, c, d)
as = SVector{3}(a)
bs = SVector{3}(b)
cs = SVector{3}(c)
ds = SVector{3}(d)
return hcat(as-ds, bs-ds, cs-ds)
end

julia> @btime f6(\$a, \$b, \$c, \$d)
17.349 ns (0 allocations: 0 bytes)
3×3 StaticArrays.SArray{Tuple{3,3},Float64,2,9}:
-0.543394   -0.545152  -0.275096
0.0925233   0.778087  -0.0803328
-0.065766    0.832741   0.527109
``````
1 Like

Yes, this is the fastest and easiest way to accomplish this.

Thank you, @DNF, I’ve marked the correct answer. Its 30ns in my case, but that’s on a laptop…

My result is on a laptop too, four years old. Perhaps I’m getting SIMD vectorization, or something. Have you tried starting julia with the O3 option like this?

`\$ julia -03`

Oh, and did you time it exactly like this:

``````julia> using BenchmarkTools
julia> @btime f6(\$a, \$b, \$c, \$d)
``````

?
If I remove the `\$` signs I get ~40ns.

Thank you for pointing me to the `\$` use - I usually don’t forget it to use this for more expensive arguments, but here I forgot. Now with `-O3` and `\$` I’m at 13 ns… on an i5-6300U 2.4GHz.

Still I doubt if SIMD was used. Do you know what to look for in the @code_llvm or @code_native output? Even for such an easy function as f6, I get lost in the output… Is there a specific instruction indicating a SIMD parallel subtract or parallel move?

Just by chance, do you know how to pass the -O3 option to Julia running as Jupyter backend?

You can look for e.g. `fadd <4 x double> %6, %10`, basically anything that involves a `<q x type>` where `q` is usually 2, or 4, and `type` is the LLVM type.

1 Like

Thank you, big help.
Petr