Improve the performance of multiplication of an arbitrary number of matrices

Guys,

I found something interesting, can anyone help me? I really can’t explain.

Take a look at the following benchmark:

D1 = @SMatrix rand(3,3)
D2 = @SMatrix rand(3,3)

@inline function compose_rotation(D1::SMatrix{3,3}, D2::SMatrix{3,3}, Ds::SMatrix{3,3}...)
    result = D2*D1

    for Di in Ds
        result = Di*result
    end

    result
end

test() = D2*D1*D2*D1*D1*D2*D1*D2*D1

@btime test()
@btime compose_rotation(D1,D2,D1,D2,D1,D1,D2,D1,D2)

I am getting:

@btime test()
  614.213 ns (14 allocations: 1.45 KiB)
3×3 StaticArrays.SArray{Tuple{3,3},Float64,2,9}:
 31.6736  6.31623  21.519
 19.4169  3.87201  13.1921
 25.5171  5.08857  17.3361

@btime compose_rotation(D1,D2,D1,D2,D1,D1,D2,D1,D2)
  117.351 ns (1 allocation: 80 bytes)
3×3 StaticArrays.SArray{Tuple{3,3},Float64,2,9}:
 31.6736  6.31623  21.519
 19.4169  3.87201  13.1921
 25.5171  5.08857  17.3361

Why the multiplication using the function is 6x faster than the explicit multiplication?