What's the problem with this simple multi-thread code?

@inbounds needs to go inside @threads What is the cause of this performance difference between Julia and Cython? - #4 by tkf

In general, try minimizing the scope of @inbounds.

I think this needs @inbounds to compare this properly with LoopVectorization.jl and Tullio.jl. This also applies to @threads and @floop.

GC may re-use the same memory region. Maybe a more robust approach is to allocate a lot of arrays such that the total number of bytes is at least as large as (say) a double of the L3 cache size and use them one by one.

2 Likes