Good point. After pulling that out, distances_cast_avx
is down to 113ms, but distances_threaded_simd
is hardly changed, and distances_tullio
not at all. I think LoopVectorization is happy to re-order loops and to pull such functions out of them, and I guess it succeeded.
If LoopVec realizes that that’s a really nice feature I wonder if it’s guaranteed though, because if it fails it can be a massive loss.