I see, thanks! @mforets, see the readme in GitHub - JuliaCI/BenchmarkTools.jl: A benchmarking framework for the Julia language. With that the broadcast is slightly faster. For n=100_000 axpy is faster because of multithreading, for n=1_000_000 it still burns all my cores, but actually takes exactly the same time as broadcasting. I officially give up understanding multithreading performance.
5 Likes