LoopVectorization.jl: adding `@avx` makes code slower

mcabbott · August 29, 2020, 2:14pm

OK, I should have just pasted the link I guess!

Having sizes in the type parameters is surely ideal for @nexprs. But whether this is faster, you will have to try I suppose. On my comptuer, today:

julia> @btime rotation!($Crot, $Q, $C, $mid)
  455.543 ns (0 allocations: 0 bytes)

julia> @btime rotation_avx!($Crot, $Q, $C, $mid);
  136.706 ns (0 allocations: 0 bytes)

julia> @btime rotation_nexpr!($Crot, $Q, $C);
  180.331 ns (0 allocations: 0 bytes)

julia> QS = @SMatrix rand(3,3);

julia> @btime rotation!($Crot, $QS, $C, $mid);
  377.343 ns (0 allocations: 0 bytes)

julia> @btime rotation_avx!($Crot, $QS, $C, $mid); # LoopVectorization.check_args(QS) == false
  377.142 ns (0 allocations: 0 bytes)

julia> @btime rotation_nexpr!($Crot, $QS, $C);
  167.569 ns (0 allocations: 0 bytes)

Topic		Replies	Views
Simple Mat-Vec multiply (understanding performance, without the bugs) Performance tullio	16	3298	August 12, 2020
[ANN] LoopVectorization Package Announcements	157	23295	May 27, 2020
ANN: LoopVectorization 0.12: multithreading and better handling of discontiguous memory accesses Performance	16	2170	March 17, 2021
Sum operations between arrays Performance	21	5628	April 7, 2020
Improve the performance of multiplication of an arbitrary number of matrices Performance	36	2426	May 16, 2018

LoopVectorization.jl: adding `@avx` makes code slower

Related topics