Here’s a version with LoopVectorization.jl but with the above bug fixed. It’s more concise, but about 2x slower (still faster than the manual vectorization, though). Perhaps the additional 2x speed is from a fastmath?
using LoopVectorization
allfinite_turbo2(x) = vmapreduce(xi->xi*zero(xi), +, x) == 0
we can see the issue goes away vs the manual looped one:
julia> x = @MVector[NaN]
1-element MVector{1, Float64} with indices SOneTo(1):
NaN
julia> allfinite_turbo(x)
true
julia> allfinite_turbo2(x)
false
and the speed difference:
julia> @btime allfinite_turbo(x) setup=(x=randn(1000));
86.541 ns (0 allocations: 0 bytes)
julia> @btime allfinite_turbo2(x) setup=(x=randn(1000));
156.029 ns (0 allocations: 0 bytes)
compared to some other strategies in this thread:
julia> f1(x)=all(isfinite,x); @btime f1(x) setup=(x=randn(1000));
425.040 ns (0 allocations: 0 bytes)
julia> f2(x)=isfinite(sum(xi->xi*zero(xi),x)); @btime f2(x) setup=(x=randn(1000));
303.120 ns (0 allocations: 0 bytes)
julia> f3(x)=isfinite(sum(x)); @btime f3(x) setup=(x=randn(1000));
229.855 ns (0 allocations: 0 bytes)
(Note that f3 is unsafe in case of overflow)