# Fastest way to check for Inf or NaN in an array?

What is the fastest way to check for Inf or NaN in an array? This has surprisingly become the bottleneck in one of my codes, and I’m looking for the fastest possible strategy

I noticed that running `sum` on an array, then checking if the output is finite, is faster than doing `any(isnan.(x))` on the array, which I found weird although I realize that `sum` is extremely optimized and probably not `any`. This is also because my arrays will contain very few NaNs/Infs, so `any` will have to search through the entire array.

In fact, just using `sum` instead of `any` is actually more efficient on a very sparse array in general:

``````fast_any(x) = sum(x) > 0
x = zeros(Bool, 10000)

@btime any(x)  #   3.120 μs
@btime fast_any(x) #  287.138 ns
``````

A whole order of magnitude… (I note when the array is not sparse, `any` is the clear winner. But for an array with very few `true`, `sum` is faster, which I find odd.)

Here are my current implementations of Inf/NaN checkers, with their speeds given below:

``````function simple_check(x)
any(isnan.(x)) || any(.!isfinite.(x))
end

function simd_check(x, n)
isnan_or_inf = Array{Bool, 1}(undef, n)
@inbounds @simd for j=1:n
isnan_or_inf[j] = isnan(x[j]) || !isfinite(x[j])
end
any(isnan_or_inf)
end

function sum_check(x)
s = sum(x)
isnan(s) || !isfinite(s)
end
``````

The performances are (using `x = randn(Float32, 1000)` - i.e., no non-finite values):

``````julia> @btime simple_check(x)
649.153 ns (6 allocations: 8.81 KiB)
``````
``````julia> @btime simd_check(x, 1000)
558.973 ns (1 allocation: 1.06 KiB)
``````
``````julia> @btime sum_check(x)
90.423 ns (0 allocations: 0 bytes)
``````

Now, obviously I don’t actually need to compute the sum of this array, but actually using the `sum` operation is significantly faster than the others! (and note that summing an Inf or NaN will result in another non-finite value).

Is there a faster way to check for NaNs and Infs? I note that my arrays seldom will actually have one, so the method should be optimized for this.

Edit #1: Updated times:

``````julia> @btime all(isfinite, x) setup=x=randn(Float32, 1000)
321.970 ns (0 allocations: 0 bytes)
julia> @btime isfinite(sum(x)) setup=x=randn(Float32, 1000)
80.913 ns (0 allocations: 0 bytes)
``````

and, if overflows are an issue:

``````julia> @btime isfinite(sum(x .* 0)) setup=x=randn(Float32, 1000)
229.604 ns (1 allocation: 4.06 KiB)
``````

(since `0 * Inf = NaN`)
so `sum` is still faster…

2 Likes

to avoid allocation:

``````any(isnan, x)
``````

``````julia> function simple_check(xs)
any(x -> isnan(x) || !isfinite(x) ,xs)
end
simple_check (generic function with 1 method)

julia> @btime simple_check(x) setup=x=zeros(Bool, 10000)
1.142 ns (0 allocations: 0 bytes)
false
``````
4 Likes

First of all, `NaN` isn’t finite, so all you can get that for free, so `isfinite` is the only check you need.

4 Likes

btw don’t you just want:

``````all(isfinite, x)
``````

?

4 Likes

Also, your `sum` check is incorrect if the sum of `x` overflows.

I get this for `all`:

``````julia> @btime all(isfinite, x)
129.762 ns (0 allocations: 0 bytes)
``````

and similar for `any(isnan, x)`.

Actually this one looks like the current winner, thanks @jling:

(Edit: `sum` is faster, see below)

``````julia> function simple_check(xs)
any(x -> isnan(x) || !isfinite(x) ,xs)
end
julia> @btime simple_check(x)
8.634 ns (0 allocations: 0 bytes)
``````

they should be the same speed:

``````julia> @btime all(isfinite, x) setup=x=zeros(Bool, 10000)
1.172 ns (0 allocations: 0 bytes)
true
``````

what Julia version?

Good point @Oscar_Smith, I guess `isfinite` is all I need. And good point about overflows.

It’s still weird how `sum` and `any` are comparable though. Shouldn’t `any` be significantly faster for a Bool array?

for `Bool` arrays, Julia is able to prove that the sum must be an integer, so it is always finite, while the `any` approach isn’t getting fully constant folded.

2 Likes

@jling My benchmarks of each implementation uses `x=randn(Float32, 1000)`, hence the difference in performance. I am using the 1.8 dev version.

1 Like
``````julia> function simple_check(xs)
any(x -> isnan(x) || !isfinite(x) ,xs)
end
simple_check (generic function with 1 method)

julia> @btime simple_check(x) setup=x=randn(Float32, 1000)
690.549 ns (0 allocations: 0 bytes)
false

julia> @btime all(isfinite, x) setup=x=randn(Float32, 1000)
466.772 ns (0 allocations: 0 bytes)
true
``````

Not sure why (probably I’m doing something wrong) but I’m not seeing the same. By the way you should modify the `simple_check` to use `!isfinite` only just like the `all`.

(edit: updated times below)

``````x2 = rand(Bool, 100000);
@code_typed sum_check(x2)
CodeInfo(
1 ─ %1 = Base.identity::typeof(identity)
│        invoke Base._simple_count(%1::typeof(identity), x::Vector{Bool}, 0::Int64)::Int64
└──      return false
) => Bool
``````

yeah. The `sum_check` is constant foldable to `false`.

Not sure how to interpret this, sorry. Is `Base._simple_count` faster than `any` for some unfair reason?

But yeah looks like the expression `any(xi -> !isfinite(xi), x)` is quite fast - thanks!

Edit: Nevermind, `sum` is still faster. See below.

Note that I’m inputting a `Float32` array into `sum_check`. The `Bool` example was just for `any` vs `sum` in general, not for NaN checking.

@jling I ran it again and now I’m seeing comparable times. No idea what went wrong in my post above, sorry about that.

``````
julia> @btime simple_check(x)
3.130 μs (0 allocations: 0 bytes)
false

julia> @btime all(isfinite, x)
3.260 μs (0 allocations: 0 bytes)
true
``````

(10000 size array now)

Actually, nevermind. Still an unsolved mystery. See below:

``````julia> @btime all(isfinite, x) setup=x=randn(Float32, 10000)
3.109 μs (0 allocations: 0 bytes)
true

julia> @btime isfinite(sum(x)) setup=x=randn(Float32, 10000)
795.135 ns (0 allocations: 0 bytes)
true
``````

Regarding the potential overflow @Oscar_Smith, I can actually do the following, and still get better performance with sum:

``````julia> @btime isfinite(sum(x .* 0)) setup=x=randn(Float32, 10000)
1.450 μs (2 allocations: 39.11 KiB)
``````

Since 0 times Inf is NaN, this technically is valid.

So, `sum` is still the way to go it seems. Not sure why it’s faster…

With `LoopVectorization` you can go even faster.

``````function h(x)
s = zero(eltype(x))
@turbo for i in eachindex(x)
s += x[i]*0
end
return isfinite(s)
end
``````
1 Like

I presume this is because `all` / `any` exit early if they can, which might prevent SIMD?

``````julia> @btime all(isfinite, x) setup=x=randn(Float32, 10000)
min 3.125 μs, mean 3.180 μs (0 allocations)
true

julia> @btime isfinite(sum(x)) setup=x=randn(Float32, 10000)
min 804.945 ns, mean 813.873 ns (0 allocations)
true

julia> @btime all(isfinite, x) setup=(x=randn(Float32, 10000); x = Inf)  # much faster
min 12.387 ns, mean 12.580 ns (0 allocations)
false

julia> @btime isfinite(sum(x)) setup=(x=randn(Float32, 10000); x = Inf)  # the same
min 804.945 ns, mean 811.919 ns (0 allocations)
false
``````

Also note that you can multiply by zero without the allocation:

``````julia> @btime isfinite(sum(x .* 0)) setup=x=randn(Float32, 10000)
min 1.429 μs, mean 2.423 μs (2 allocations, 39.11 KiB)
true

julia> @btime isfinite(sum(y -> y*0, x)) setup=x=randn(Float32, 10000)
min 949.261 ns, mean 962.644 ns (0 allocations)
true
``````
2 Likes

Here is the absolute fastest version. It uses a combination of early exits and vectorization to get maximum performance.

``````function y(x)
for i in firstindex(x):64:(lastindex(x)-63)
s = zero(eltype(x))
@turbo for j in 0:63
s += x[i+j]*0
end
!isfinite(s) && return false
end
return all(isfinite, @view x[max(end-64,begin):end])
end
``````

(for sparse arrays, you want to do the same thing, but on `nonzeros(x)` instead of `x`)

6 Likes