LoopVectorization.jl: adding `@avx` makes code slower

OK, I should have just pasted the link I guess!

Having sizes in the type parameters is surely ideal for @nexprs. But whether this is faster, you will have to try I suppose. On my comptuer, today:

julia> @btime rotation!($Crot, $Q, $C, $mid)
  455.543 ns (0 allocations: 0 bytes)

julia> @btime rotation_avx!($Crot, $Q, $C, $mid);
  136.706 ns (0 allocations: 0 bytes)

julia> @btime rotation_nexpr!($Crot, $Q, $C);
  180.331 ns (0 allocations: 0 bytes)

julia> QS = @SMatrix rand(3,3);

julia> @btime rotation!($Crot, $QS, $C, $mid);
  377.343 ns (0 allocations: 0 bytes)

julia> @btime rotation_avx!($Crot, $QS, $C, $mid); # LoopVectorization.check_args(QS) == false
  377.142 ns (0 allocations: 0 bytes)

julia> @btime rotation_nexpr!($Crot, $QS, $C);
  167.569 ns (0 allocations: 0 bytes)