I donβt think it is an βofficial wayβ. The official way would be to have the performance regression properly fixed. I just noticed that this particular rewrite fixed the example you gave and offered it as a potential workaround, while the issue is being looked at more carefully.
OK done, commit fe1253ee258674844b8c0350deb05018909e823e is the first bad commit. But looking at what has changed I really do not see why it breaks vectorization.
Here is the output of git bisect log:
git bisect start
# good: [2e3364e02f1dc3777926590c5484e7342bc0285d] [loader]: Re-export symbols for C embedding, rename to `libjulia-internal` (#38160)
git bisect good 2e3364e02f1dc3777926590c5484e7342bc0285d
# good: [7c17bb361e859aa834034c977ca683a78f17d506] Add examples for endswith and startswith (#38255)
git bisect good 7c17bb361e859aa834034c977ca683a78f17d506
# bad: [9631a9fcee01643ccffc8e3c4a7f34b659fa2580] add __CET__ check guards to trampoline assembly (#38683)
git bisect bad 9631a9fcee01643ccffc8e3c4a7f34b659fa2580
# good: [49b8e61a80b8108ca0a23f8075a0d0508b6947c7] Fix out-of-tree compilation of loader library. (#38677)
git bisect good 49b8e61a80b8108ca0a23f8075a0d0508b6947c7
# bad: [8ffcc0ea9274203420e407d0f921cdd4c346fa22] Fix `stdlib/Makefile` rules for JLLs (#38688)
git bisect bad 8ffcc0ea9274203420e407d0f921cdd4c346fa22
# bad: [fe1253ee258674844b8c0350deb05018909e823e] fix #38664, regression in `===` codegen for Bool (#38686)
git bisect bad fe1253ee258674844b8c035
You can. I am pretty surprised about that bisection but stranger things have happened. Please add enough information so that someone could reproduce the slowdown to confirm the identified commit.
I was also surprised when I looked at the tiny changes implemented by this commit. But I double checked that the previous commit has no such issue and that the issue is there after this commint. I will write a small test case to demonstrate this.
From my understanding, it would be different considering that adding the @inbounds and/or @simd annotations didnβt bring any significant change in performance. For example, the following displays the same slowdown:
function iter_2(X_bin, hist, Ξ΄, π)
hist .= 0.0
@inbounds @simd for i in CartesianIndices(π)
@inbounds @simd for k in 1:3
hist[k, X_bin[π[i],1], 1] += Ξ΄[π[i],k]
end
end
end
I changed the line in abstractarray.jl according to your PR (and rebuild the 2 versions of Julia) but it does not change my timings although it does not hurt. Really the only difference between version 1.6.0-DEV.1647 and version 1.6.0-DEV.1648 is the modifications around line 2582 of src/codegen.cpp.
In case it may helps to figure out the origin of the issue, I benchmarked a function that combine the computed weights into a single value (instead of storing them in some destination array) and it is as fast with either version of Julia.
function sum_prod_weights(src::Array{T,1}) where {T<:AbstractFloat}
s = zero(T)
@inbounds @simd for i in eachindex(src)
w1, w2, w3, w4 = compute_weights(src[i])
s += w1*w2*w3*w4
end
return s
end