Fastest way to check for Inf or NaN in an array?

What is the fastest way to check for Inf or NaN in an array? This has surprisingly become the bottleneck in one of my codes, and I’m looking for the fastest possible strategy

I noticed that running sum on an array, then checking if the output is finite, is faster than doing any(isnan.(x)) on the array, which I found weird although I realize that sum is extremely optimized and probably not any. This is also because my arrays will contain very few NaNs/Infs, so any will have to search through the entire array.

In fact, just using sum instead of any is actually more efficient on a very sparse array in general:

fast_any(x) = sum(x) > 0
x = zeros(Bool, 10000)

@btime any(x)  #   3.120 μs
@btime fast_any(x) #  287.138 ns

A whole order of magnitude… (I note when the array is not sparse, any is the clear winner. But for an array with very few true, sum is faster, which I find odd.)

Here are my current implementations of Inf/NaN checkers, with their speeds given below:

function simple_check(x)
    any(isnan.(x)) || any(.!isfinite.(x))
end

function simd_check(x, n)
    isnan_or_inf = Array{Bool, 1}(undef, n)
    @inbounds @simd for j=1:n
        isnan_or_inf[j] = isnan(x[j]) || !isfinite(x[j])
    end
    any(isnan_or_inf)
end

function sum_check(x)
    s = sum(x)
    isnan(s) || !isfinite(s)
end

The performances are (using x = randn(Float32, 1000) - i.e., no non-finite values):

julia> @btime simple_check(x)
  649.153 ns (6 allocations: 8.81 KiB)
julia> @btime simd_check(x, 1000)
  558.973 ns (1 allocation: 1.06 KiB)
julia> @btime sum_check(x)
  90.423 ns (0 allocations: 0 bytes)

Now, obviously I don’t actually need to compute the sum of this array, but actually using the sum operation is significantly faster than the others! (and note that summing an Inf or NaN will result in another non-finite value).

Is there a faster way to check for NaNs and Infs? I note that my arrays seldom will actually have one, so the method should be optimized for this.

Edit #1: Updated times:

julia> @btime all(isfinite, x) setup=x=randn(Float32, 1000)
  321.970 ns (0 allocations: 0 bytes)
julia> @btime isfinite(sum(x)) setup=x=randn(Float32, 1000)
  80.913 ns (0 allocations: 0 bytes)

and, if overflows are an issue:

julia> @btime isfinite(sum(x .* 0)) setup=x=randn(Float32, 1000)
  229.604 ns (1 allocation: 4.06 KiB)

(since 0 * Inf = NaN)
so sum is still faster…

3 Likes

to avoid allocation:

any(isnan, x)

julia> function simple_check(xs)
           any(x -> isnan(x) || !isfinite(x) ,xs)
       end
simple_check (generic function with 1 method)

julia> @btime simple_check(x) setup=x=zeros(Bool, 10000)
  1.142 ns (0 allocations: 0 bytes)
false
9 Likes

First of all, NaN isn’t finite, so all you can get that for free, so isfinite is the only check you need.

10 Likes

btw don’t you just want:

all(isfinite, x)

?

8 Likes

Also, your sum check is incorrect if the sum of x overflows.

1 Like

I get this for all:

julia> @btime all(isfinite, x)
  129.762 ns (0 allocations: 0 bytes)

and similar for any(isnan, x).

Actually this one looks like the current winner, thanks @jling:

(Edit: sum is faster, see below)

julia> function simple_check(xs)
           any(x -> isnan(x) || !isfinite(x) ,xs)
       end
julia> @btime simple_check(x)
  8.634 ns (0 allocations: 0 bytes)

they should be the same speed:

julia> @btime all(isfinite, x) setup=x=zeros(Bool, 10000)
  1.172 ns (0 allocations: 0 bytes)
true

what Julia version?

Good point @Oscar_Smith, I guess isfinite is all I need. And good point about overflows.

It’s still weird how sum and any are comparable though. Shouldn’t any be significantly faster for a Bool array?

for Bool arrays, Julia is able to prove that the sum must be an integer, so it is always finite, while the any approach isn’t getting fully constant folded.

2 Likes

@jling My benchmarks of each implementation uses x=randn(Float32, 1000), hence the difference in performance. I am using the 1.8 dev version.

1 Like
julia> function simple_check(xs)
           any(x -> isnan(x) || !isfinite(x) ,xs)
       end
simple_check (generic function with 1 method)


julia> @btime simple_check(x) setup=x=randn(Float32, 1000)
  690.549 ns (0 allocations: 0 bytes)
false

julia> @btime all(isfinite, x) setup=x=randn(Float32, 1000)
  466.772 ns (0 allocations: 0 bytes)
true

Not sure why (probably I’m doing something wrong) but I’m not seeing the same. By the way you should modify the simple_check to use !isfinite only just like the all.

(edit: updated times below)

x2 = rand(Bool, 100000);
@code_typed sum_check(x2)
CodeInfo(
1 ─ %1 = Base.identity::typeof(identity)
│        invoke Base._simple_count(%1::typeof(identity), x::Vector{Bool}, 0::Int64)::Int64
└──      return false
) => Bool

yeah. The sum_check is constant foldable to false.

2 Likes

Not sure how to interpret this, sorry. Is Base._simple_count faster than any for some unfair reason?

But yeah looks like the expression any(xi -> !isfinite(xi), x) is quite fast - thanks!

Edit: Nevermind, sum is still faster. See below.

Note that I’m inputting a Float32 array into sum_check. The Bool example was just for any vs sum in general, not for NaN checking.

@jling I ran it again and now I’m seeing comparable times. No idea what went wrong in my post above, sorry about that.


julia> @btime simple_check(x)
  3.130 μs (0 allocations: 0 bytes)
false

julia> @btime all(isfinite, x)
  3.260 μs (0 allocations: 0 bytes)
true

(10000 size array now)

Actually, nevermind. Still an unsolved mystery. See below:

julia> @btime all(isfinite, x) setup=x=randn(Float32, 10000)
  3.109 μs (0 allocations: 0 bytes)
true

julia> @btime isfinite(sum(x)) setup=x=randn(Float32, 10000)
  795.135 ns (0 allocations: 0 bytes)
true

Regarding the potential overflow @Oscar_Smith, I can actually do the following, and still get better performance with sum:

julia> @btime isfinite(sum(x .* 0)) setup=x=randn(Float32, 10000)
  1.450 μs (2 allocations: 39.11 KiB)

Since 0 times Inf is NaN, this technically is valid.

So, sum is still the way to go it seems. Not sure why it’s faster…

With LoopVectorization you can go even faster.

function h(x)
    s = zero(eltype(x))
    @turbo for i in eachindex(x)
        s += x[i]*0
    end
    return isfinite(s)
end
1 Like

I presume this is because all / any exit early if they can, which might prevent SIMD?

julia> @btime all(isfinite, x) setup=x=randn(Float32, 10000)
  min 3.125 μs, mean 3.180 μs (0 allocations)
true

julia> @btime isfinite(sum(x)) setup=x=randn(Float32, 10000)
  min 804.945 ns, mean 813.873 ns (0 allocations)
true

julia> @btime all(isfinite, x) setup=(x=randn(Float32, 10000); x[33] = Inf)  # much faster
  min 12.387 ns, mean 12.580 ns (0 allocations)
false

julia> @btime isfinite(sum(x)) setup=(x=randn(Float32, 10000); x[33] = Inf)  # the same
  min 804.945 ns, mean 811.919 ns (0 allocations)
false

Also note that you can multiply by zero without the allocation:

julia> @btime isfinite(sum(x .* 0)) setup=x=randn(Float32, 10000)
  min 1.429 μs, mean 2.423 μs (2 allocations, 39.11 KiB)
true

julia> @btime isfinite(sum(y -> y*0, x)) setup=x=randn(Float32, 10000)
  min 949.261 ns, mean 962.644 ns (0 allocations)
true
3 Likes

Here is the absolute fastest version. It uses a combination of early exits and vectorization to get maximum performance.

function y(x)
    for i in firstindex(x):64:(lastindex(x)-63)
        s = zero(eltype(x))
        @turbo for j in 0:63
            s += x[i+j]*0
        end
        !isfinite(s) && return false
        end
    return all(isfinite, @view x[max(end-64,begin):end])
end

(for sparse arrays, you want to do the same thing, but on nonzeros(x) instead of x)

7 Likes