Inconsistency between `findmax` and `maximum` with respect to `NaN`

wsshin · June 12, 2017, 3:16am

I read this discussion about max and maximum returning NaN when the collection includes NaN. For example,

julia> VERSION
v"0.7.0-DEV.82"

julia> v = [1.0, NaN, 2.0]
3-element Array{Float64,1}:
   1.0
 NaN
   2.0

julia> maximum(v)
NaN

However, findmax still returns the maximum among non-NaN values and its index:

julia> findmax(v)
(2.0, 3)

This seems inconsistent. Is this something intended or a bug?

bjarthur · June 12, 2017, 11:12am

seems like a bug to me. i’d suggest creating a new issue for it on the julia github repo. or better yet, fix it and submit a PR.

and thanks for the link to the discussion on NaNs. therein i found a reference to NaNMath.jl, which i am going to immediately start using!

Tamas_Papp · June 12, 2017, 1:34pm

This violates certain invariances that users may expect, such as

findmax(x)[1] == maximum(x)

so I would say it is a bug. If we take the “NaN should propagate” view, it both should be NaN, but I have no good suggestion for the position returned by findmax (maybe 0? yet another corner case). If we take the view that “NaN encodes missing data”, both should return 2.0.

wsshin · June 12, 2017, 4:42pm

Thanks! I just filed an issue: https://github.com/JuliaLang/julia/issues/22337.

wsshin · June 12, 2017, 6:06pm

By the way, the original motivation of this question was to find a way to perform NaN-ignoring maximum. I thought I could use findmax for this if the above reported behavior of findmax was intended.

However, now that the above NaN-ignoring behavior of findmax seems like a bug, I need a better way to find the maximum ignoring NaN. Some discussion thread suggested something like maximum(v[!isnan.(v)]), which I don’t like because of too many unnecessary allocations.

The best way I’ve figured out is to define nanmax that returns the greater of the two numbers ignoring NaN, and to use it in reduce:

julia> nanmax(x,y) = isnan(x) ? y : (isnan(y) ? x : max(x,y))
nanmax (generic function with 1 method)

julia> reduce(nanmax, v)
2.0

A couple of notes:

You can generalize this to multidimensional arrays A using reducedim, as reducedim(nanmax, A, 2) that finds the maxima along the second dimension ignoring NaN.
When A is StaticArray, you can use the type-stable version of this: reducedim(nanmax, A, Val{2}). By using Val{2} instead of 2 as the search dimension indicator, you can make reducedim to produce a StaticArray. See https://github.com/JuliaArrays/StaticArrays.jl/pull/86.

Edit: @DNF pointed out that y > x returns false for all x if y = NaN, so the above nanmax can be simplified to this:

nanmax(x, y) = isnan(x) ? y : (y > x ? y : x)

DNF · June 14, 2017, 8:25am

For the array maximum you can save quite a bit of time by only checking for isnan up to the point where the first non-NaN element is encountered, and then switching to ordinary > comparison, since NaN > maxval is false.

The code isn’t as clean though.

DNF · June 14, 2017, 8:43am

Also, I think

nanmax(x, y) = isnan(x) ? y : (y > x ? y : x)

is slightly faster, and has shorter generated code.

wsshin · June 14, 2017, 7:29pm

How about this? Define nanmax as

nanmax(x, y) = x<y ? y : x

and do

reduce(nanmax, -Inf, v)

The above nanmax does not perform isnan at all, so it’s a bit weird to call it nanmax, but it works by the following reason.

If x is guaranteed to be non-NaN, the above nanmax always returns the correct non-NaN maximum between x and y, regardless of y being NaN or non-NaN. Now, note that reduce(nanmax, v) performs reduction by putting the intermediate reduction result up to the current entry of v into x and the next entry of v into y. Therefore, if the first entry of v is non-NaN, x of nanmax(x,y) stays always non-NaN throughout the entire reduction procedure of reduce(nanmax, v).

The only case reduce(nanmax, v) does not work is when the first entry of v is NaN. We can avoid this case by using reduce(nanmax, -Inf, v), which is effectively the same as reduce(nanmax, w) for w = [-Inf; v] whose first entry is non-NaN.

The problem of this is that it produces -Inf even if v is completely filled with NaN, for which case we may want to get NaN. However, if we know v has at least one non-NaN entry, this technique works.

Inspired by this idea, we can define nanmaximum more generally as follows:

nanmaximum(v::AbstractVector{T}) where {T<:Number} = reduce((x,y) -> x<y ? y : x, typemin(T), v)

DNF · June 14, 2017, 7:43pm

Yes, this is pretty much what I was thinking as well. Here’s my (ugly) code which also handles all NaNs. Speed is the same as your code:

function _maximum(y, istart)
    val = typemin(eltype(y))
    for i in eachindex(y)[istart:end]
        if y[i] > val
            val = y[i]
        end
    end
    return val
end

function nanmaximum(y)
    for i in eachindex(y)
        if !isnan(y[i])
            return _maximum(y, i)
        end
    end
    return convert(eltype(y), NaN)
end

mkborregaard · June 15, 2017, 9:41am

@bjarthur above linked to NaNMath.jl, which I believe is the intended way of dealing with this, i.e. you could use NaNMath.maximum rather than implementing your own replacemen… The code for maximum in NaNMath looks a lot like @DNF’s

function maximum{T<:AbstractFloat}(x::AbstractArray{T})
    result = convert(eltype(x), NaN)
    for i in x
        if !isnan(i)
            if (isnan(result) || i > result)
                result = i
            end
        end
    end
    return result
end

Note that this (and DNF’s code) may not work for all iterable collection of all eltypes, which is why NaNMath restricts it to {T<:AbstractFloat}(x::AbstractArray{T}). This can be annoying in practice if your code sometimes passes something else, like a UnitRange{Int} to the function.
In Plots this was resolved by defining an internal function that dispatched to the NaNMath versoin for {T<:AbstractFloat}(x::AbstractArray{T}), and to the Base version as fallback.

Topic		Replies	Views
Ignoring `NaN` with `findmax()` General Usage nan	9	4026	May 25, 2020
NaN and maximum/minimum? New to Julia	3	6363	November 13, 2018
min(NaN,5) = NaN (why no @fastmath?) General Usage nan	47	1921	February 2, 2023
Max of a vector New to Julia question	31	8569	February 24, 2025
Why `maximum` uses `max` and `findmax` uses `isless`? New to Julia	2	784	August 18, 2017

Inconsistency between `findmax` and `maximum` with respect to `NaN`

Related topics