Interesting. Mind showing the @code_native debuginfo=:none
and/or the @code_llvm debuginfo=:none
of both?
The fact that sum
didn’t improve suggests it is still just using ymm
registers on Julia 1.6.
Then, mind also showing this for both versions:
VectorizationBase.REGISTER_COUNT
and VectorizationBase.REGISTER_SIZE
?
VectorizationBase.REGISTER_SIZE
is now obviously in agreement with LLVM (if it isn’t, you’ll get crashes), but I’m worried about REGISTER_COUNT
. Depending on those results, I’ll have something else for you to run to see if that is wrong.
If it is wrong, code will run, but much more slowly.
So that is a possible explanation for the performance degradation, which is why I want to look into it.