SIMD.jl has some checks that are supposed to fix the undefined behavior:
But perhaps it is buggy?
SIMD.jl has some checks that are supposed to fix the undefined behavior:
But perhaps it is buggy?
That’s interesting Julius. Thanks for that analysis.
I might consider that hacky approach in future, though I’m also interested to see if https://github.com/JuliaLang/julia/pull/44186 (kindly pointed out to me by Sukera) might remove the overhead that’s being such a pain here?
Likely! Shouldn’t these be y >= sizeof(T1)*8?