Fastest way to check for Inf or NaN in an array?

MilesCranmer · February 23, 2022, 5:04am

What is the fastest way to check for Inf or NaN in an array? This has surprisingly become the bottleneck in one of my codes, and I’m looking for the fastest possible strategy

I noticed that running sum on an array, then checking if the output is finite, is faster than doing any(isnan.(x)) on the array, which I found weird although I realize that sum is extremely optimized and probably not any. This is also because my arrays will contain very few NaNs/Infs, so any will have to search through the entire array.

In fact, just using sum instead of any is actually more efficient on a very sparse array in general:

fast_any(x) = sum(x) > 0
x = zeros(Bool, 10000)

@btime any(x)  #   3.120 μs
@btime fast_any(x) #  287.138 ns

A whole order of magnitude… (I note when the array is not sparse, any is the clear winner. But for an array with very few true, sum is faster, which I find odd.)

Here are my current implementations of Inf/NaN checkers, with their speeds given below:

function simple_check(x)
    any(isnan.(x)) || any(.!isfinite.(x))
end

function simd_check(x, n)
    isnan_or_inf = Array{Bool, 1}(undef, n)
    @inbounds @simd for j=1:n
        isnan_or_inf[j] = isnan(x[j]) || !isfinite(x[j])
    end
    any(isnan_or_inf)
end

function sum_check(x)
    s = sum(x)
    isnan(s) || !isfinite(s)
end

The performances are (using x = randn(Float32, 1000) - i.e., no non-finite values):

julia> @btime simple_check(x)
  649.153 ns (6 allocations: 8.81 KiB)

julia> @btime simd_check(x, 1000)
  558.973 ns (1 allocation: 1.06 KiB)

julia> @btime sum_check(x)
  90.423 ns (0 allocations: 0 bytes)

Now, obviously I don’t actually need to compute the sum of this array, but actually using the sum operation is significantly faster than the others! (and note that summing an Inf or NaN will result in another non-finite value).

Is there a faster way to check for NaNs and Infs? I note that my arrays seldom will actually have one, so the method should be optimized for this.

Edit #1: Updated times:

julia> @btime all(isfinite, x) setup=x=randn(Float32, 1000)
  321.970 ns (0 allocations: 0 bytes)
julia> @btime isfinite(sum(x)) setup=x=randn(Float32, 1000)
  80.913 ns (0 allocations: 0 bytes)

and, if overflows are an issue:

julia> @btime isfinite(sum(x .* 0)) setup=x=randn(Float32, 1000)
  229.604 ns (1 allocation: 4.06 KiB)

(since 0 * Inf = NaN)
so sum is still faster…

jling · February 23, 2022, 5:14am

to avoid allocation:

any(isnan, x)

julia> function simple_check(xs)
           any(x -> isnan(x) || !isfinite(x) ,xs)
       end
simple_check (generic function with 1 method)

julia> @btime simple_check(x) setup=x=zeros(Bool, 10000)
  1.142 ns (0 allocations: 0 bytes)
false

Oscar_Smith · February 23, 2022, 5:15am

First of all, NaN isn’t finite, so all you can get that for free, so isfinite is the only check you need.

jling · February 23, 2022, 5:15am

btw don’t you just want:

all(isfinite, x)

?

Oscar_Smith · February 23, 2022, 5:17am

Also, your sum check is incorrect if the sum of x overflows.

MilesCranmer · February 23, 2022, 5:17am

I get this for all:

julia> @btime all(isfinite, x)
  129.762 ns (0 allocations: 0 bytes)

and similar for any(isnan, x).

~~Actually this one looks like the current winner, thanks @jling:~~

(Edit: sum is faster, see below)

julia> function simple_check(xs)
           any(x -> isnan(x) || !isfinite(x) ,xs)
       end
julia> @btime simple_check(x)
  8.634 ns (0 allocations: 0 bytes)

jling · February 23, 2022, 5:19am

they should be the same speed:

julia> @btime all(isfinite, x) setup=x=zeros(Bool, 10000)
  1.172 ns (0 allocations: 0 bytes)
true

what Julia version?

MilesCranmer · February 23, 2022, 5:21am

Good point @Oscar_Smith, I guess isfinite is all I need. And good point about overflows.

It’s still weird how sum and any are comparable though. Shouldn’t any be significantly faster for a Bool array?

Oscar_Smith · February 23, 2022, 5:22am

for Bool arrays, Julia is able to prove that the sum must be an integer, so it is always finite, while the any approach isn’t getting fully constant folded.

MilesCranmer · February 23, 2022, 5:23am

@jling My benchmarks of each implementation uses x=randn(Float32, 1000), hence the difference in performance. I am using the 1.8 dev version.

jling · February 23, 2022, 5:24am

julia> function simple_check(xs)
           any(x -> isnan(x) || !isfinite(x) ,xs)
       end
simple_check (generic function with 1 method)


julia> @btime simple_check(x) setup=x=randn(Float32, 1000)
  690.549 ns (0 allocations: 0 bytes)
false

julia> @btime all(isfinite, x) setup=x=randn(Float32, 1000)
  466.772 ns (0 allocations: 0 bytes)
true

MilesCranmer · February 23, 2022, 5:27am

~~Not sure why (probably I’m doing something wrong) but I’m not seeing the same. By the way you should modify the simple_check to use !isfinite only just like the all.~~

(edit: updated times below)

Oscar_Smith · February 23, 2022, 5:27am

x2 = rand(Bool, 100000);
@code_typed sum_check(x2)
CodeInfo(
1 ─ %1 = Base.identity::typeof(identity)
│        invoke Base._simple_count(%1::typeof(identity), x::Vector{Bool}, 0::Int64)::Int64
└──      return false
) => Bool

yeah. The sum_check is constant foldable to false.

MilesCranmer · February 23, 2022, 5:31am

Not sure how to interpret this, sorry. Is Base._simple_count faster than any for some unfair reason?

~~But yeah looks like the expression any(xi -> !isfinite(xi), x) is quite fast - thanks!~~

Edit: Nevermind, sum is still faster. See below.

MilesCranmer · February 23, 2022, 5:33am

Note that I’m inputting a Float32 array into sum_check. The Bool example was just for any vs sum in general, not for NaN checking.

MilesCranmer · February 23, 2022, 5:36am

@jling I ran it again and now I’m seeing comparable times. No idea what went wrong in my post above, sorry about that.


julia> @btime simple_check(x)
  3.130 μs (0 allocations: 0 bytes)
false

julia> @btime all(isfinite, x)
  3.260 μs (0 allocations: 0 bytes)
true

(10000 size array now)

MilesCranmer · February 23, 2022, 5:40am

Actually, nevermind. Still an unsolved mystery. See below:

julia> @btime all(isfinite, x) setup=x=randn(Float32, 10000)
  3.109 μs (0 allocations: 0 bytes)
true

julia> @btime isfinite(sum(x)) setup=x=randn(Float32, 10000)
  795.135 ns (0 allocations: 0 bytes)
true

Regarding the potential overflow @Oscar_Smith, I can actually do the following, and still get better performance with sum:

julia> @btime isfinite(sum(x .* 0)) setup=x=randn(Float32, 10000)
  1.450 μs (2 allocations: 39.11 KiB)

Since 0 times Inf is NaN, this technically is valid.

So, sum is still the way to go it seems. Not sure why it’s faster…

Oscar_Smith · February 23, 2022, 5:53am

With LoopVectorization you can go even faster.

function h(x)
    s = zero(eltype(x))
    @turbo for i in eachindex(x)
        s += x[i]*0
    end
    return isfinite(s)
end

mcabbott · February 23, 2022, 5:53am

I presume this is because all / any exit early if they can, which might prevent SIMD?

julia> @btime all(isfinite, x) setup=x=randn(Float32, 10000)
  min 3.125 μs, mean 3.180 μs (0 allocations)
true

julia> @btime isfinite(sum(x)) setup=x=randn(Float32, 10000)
  min 804.945 ns, mean 813.873 ns (0 allocations)
true

julia> @btime all(isfinite, x) setup=(x=randn(Float32, 10000); x[33] = Inf)  # much faster
  min 12.387 ns, mean 12.580 ns (0 allocations)
false

julia> @btime isfinite(sum(x)) setup=(x=randn(Float32, 10000); x[33] = Inf)  # the same
  min 804.945 ns, mean 811.919 ns (0 allocations)
false

Also note that you can multiply by zero without the allocation:

julia> @btime isfinite(sum(x .* 0)) setup=x=randn(Float32, 10000)
  min 1.429 μs, mean 2.423 μs (2 allocations, 39.11 KiB)
true

julia> @btime isfinite(sum(y -> y*0, x)) setup=x=randn(Float32, 10000)
  min 949.261 ns, mean 962.644 ns (0 allocations)
true

Oscar_Smith · February 23, 2022, 5:59am

Here is the absolute fastest version. It uses a combination of early exits and vectorization to get maximum performance.

function y(x)
    for i in firstindex(x):64:(lastindex(x)-63)
        s = zero(eltype(x))
        @turbo for j in 0:63
            s += x[i+j]*0
        end
        !isfinite(s) && return false
        end
    return all(isfinite, @view x[max(end-64,begin):end])
end

(for sparse arrays, you want to do the same thing, but on nonzeros(x) instead of x)

Topic		Replies	Views
@fastmath for library functions? (2x slowdown of binary search because of NaN checks in Base) Performance fast-math	14	761	January 30, 2023
Optimizing sums of products (dot products) Performance simd , linearalgebra , sum	17	626	September 24, 2024
Help understanding vectorization (or lack thereof) Performance	15	1221	June 8, 2018
Usage of lazy vector instead of function Performance	22	954	November 7, 2022
ANN: LoopVectorization 0.12: multithreading and better handling of discontiguous memory accesses Performance	16	2170	March 17, 2021

Fastest way to check for Inf or NaN in an array?

Related topics