Here are my best attempts. One is just #48462 (which should become standard in v1.10) and the other is based on bit twiddling
isfinite_v1_10(x::AbstractFloat) = !isnan(x - x)
isfinite_bittwiddle(x::T) where T<:Base.IEEEFloat = reinterpret(Unsigned,x) & Base.exponent_mask(T) != Base.exponent_mask(T)
using BenchmarkTools
x = randn(1000);
# x[345] = -Inf # enable if desired
@btime mapreduce(isfinite,&,x)
@btime mapreduce(isfinite_v1_10,&,x)
@btime mapreduce(isfinite_bittwiddle,&,x)
# all around 230ns on v1.8.0
@btime mapfoldl(isfinite,&,x)
@btime mapfoldl(isfinite_v1_10,&,x)
@btime mapfoldl(isfinite_bittwiddle,&,x)
# all around 125ns on v1.8.0
Note that mapfoldl
was significantly faster for me (on v1.8) than mapreduce
. mapreduce
seems to sometimes do poorly on things (possibly #48129 but maybe different). None of the isfinite
variants was any faster than another. My re-definitions of isfinite
did not make things any faster.
And here’s my best hand-written (read: ugly) fma
version. Hopefully LoopVectorization could do similar, but I didn’t want to mess with it.
function allfinite(x)
unroll = 16
V = Val(unroll)
z = zero(eltype(x))
# begin vectorized segment
zz = ntuple(_->z, V)
ss = zz
startinds = eachindex(x)[begin:unroll:end-unroll+1]
for i in startinds
xx = ntuple(j -> @inbounds(x[i+j-1]), V)
ss = fma.(zz, xx, ss)
end
ss == zz || return false
# begin the tail
s = z
for i in startinds[end]+unroll:lastindex(x)
s = fma(z, @inbounds(x[i]), s)
end
return s == z
end
@btime allfinite($x)
# 70-90ns, depending on the need to check the tail
This looks like the winner. I can’t imagine a way to do this any faster except by unrolling more (which has its drawbacks). The inner loop of the @code_native
is just
.LBB0_4: # %L295
# =>This Inner Loop Header: Depth=1
vfmadd231pd (%rax,%rdx,8), %ymm0, %ymm3 # ymm3 = (ymm0 * mem) + ymm3
vfmadd231pd 32(%rax,%rdx,8), %ymm0, %ymm4 # ymm4 = (ymm0 * mem) + ymm4
vfmadd231pd 64(%rax,%rdx,8), %ymm0, %ymm1 # ymm1 = (ymm0 * mem) + ymm1
vfmadd231pd 96(%rax,%rdx,8), %ymm0, %ymm2 # ymm2 = (ymm0 * mem) + ymm2
addq $16, %rdx
cmpq %rdx, %rcx
jne .LBB0_4
If you don’t expect to find any !isfinite
values then I wouldn’t bother checking for an early exit except maybe before the tail.