It’s worth nothing that that the @simd
annotation on the innermost for loop doesn’t do anything because of the if
branch.
Now, I know you said this function is being called in a multi-threaded loop, but have you considered using threads inside this function as well? If the outer multi-threaded loop is using Threads.@spawn
(or anything derived from that) instead of Threads.@threads
, then you shouldn’t get any destructive interference from the nested multi-threading and can see performance improvements if there’s any waiting happening in the outer threaded loop.