OK, I should have just pasted the link I guess!
Having sizes in the type parameters is surely ideal for @nexprs
. But whether this is faster, you will have to try I suppose. On my comptuer, today:
julia> @btime rotation!($Crot, $Q, $C, $mid)
455.543 ns (0 allocations: 0 bytes)
julia> @btime rotation_avx!($Crot, $Q, $C, $mid);
136.706 ns (0 allocations: 0 bytes)
julia> @btime rotation_nexpr!($Crot, $Q, $C);
180.331 ns (0 allocations: 0 bytes)
julia> QS = @SMatrix rand(3,3);
julia> @btime rotation!($Crot, $QS, $C, $mid);
377.343 ns (0 allocations: 0 bytes)
julia> @btime rotation_avx!($Crot, $QS, $C, $mid); # LoopVectorization.check_args(QS) == false
377.142 ns (0 allocations: 0 bytes)
julia> @btime rotation_nexpr!($Crot, $QS, $C);
167.569 ns (0 allocations: 0 bytes)