Optimizing sums of products (dot products)

stevengj · September 17, 2024, 6:54pm

If you just put @simd in front of the loop in sumproduct_two, then it matches or exceeds the speed of dot on my machine. For such a small length (100), it is actually faster than dot on my machine because the BLAS call in dot imposes some additional overhead.

Compilers can’t do @simd automatically for this sum because it changes the answers slightly (by re-ordering the additions).

For long vectors, the sum version is almost certainly more accurate, because it uses pairwise summation whereas the BLAS call in dot probably does a “naive” sum. This won’t affect you for length-100 vectors, however, since the pairwise algorithm only turns on for length > 1024 (for performance reasons).

However, note that you are talking about a different function than the OP in this old thread (sum rather than cumsum), so I will split it off into a new thread. In general, please be more reluctant to revive ancient threads rather than starting new ones (and cross-referencing/linking older threads as appropriate).

Topic		Replies	Views
Calculating "triple" dot products Performance	14	316	January 22, 2025
Memory allocation in dot product Performance question , memory-allocation	11	1210	September 9, 2021
Avoiding intermediate vector allocations in matrix to scalar computations Performance memory-allocation , matrix	6	468	September 20, 2023
For loop optimization Performance performance , memory-allocation , loops	3	1658	October 31, 2021
Avoiding memory allocation for vector operations Performance	11	3240	July 25, 2018

Optimizing sums of products (dot products)

Related topics