Efficient creation of power series matrix or array of arrays

bennedich · December 28, 2018, 2:37am

Nice! I posted this method above btw, but in a vector format instead of a matrix.

I find this statement a bit odd/misleading – I consider cache locality crucial in algorithms like this, and not at all a micro-optimization. To see why, reverse the two for loops like this: for j = 1:m, i = 2:n and re-run and you’ll see the performance drop 10-fold! Same algorithm, same FPO count, but it executes 10 times slower. That’s only twice as fast as the naive solution, meaning that cache locality can be more important than picking the right algorithm. (The reason for this is explained in more detail in the link in my previous post.)

SIMD can also lead to enormous improvements, there was a recent topic where we played with vectorized and branchless code and were able to make some sample code several 100 times faster. The code you posted will already be SIMD-vectorized (and loop-unrolled) automatically by the compiler btw, so it’s not something you need to enable manually.

But perhaps you knew all of this already, and meant that although your code is already high-performing, you could still optimize it further…

Topic		Replies	Views
Julia vs Matlab - building a Jacobian matrix New to Julia	30	4224	March 28, 2019
Julia slower than Matlab & Python? No Performance economics , tensorflow , matlab , pytorch	120	16864	March 16, 2021
Vector - Matrix - Vector multiplication Performance	19	4228	March 14, 2021
Sum operations between arrays Performance	21	5623	April 7, 2020
LoopVectorization.jl: adding `@avx` makes code slower Performance question , tullio , loopvectorization	8	1162	August 29, 2020

Efficient creation of power series matrix or array of arrays

Related topics