Ignoring NaN in elementwise aggregations

Tamas_Papp · December 18, 2019, 9:04am

I would like to do calculations like

mean(a_bunch_of_similar_arrays)

but with NaNs quietly ignored.

MWE with an inefficient and inelegant solution:

A1 = [NaN 1; 2 3]

A2 = [1 NaN; 2 3]

As = [A1, A2]

function elementwise_ignoreNaN(f, arrays)
    ax = axes(first(arrays))
    I = CartesianIndices(ax)
    @assert all(a -> axes(a) == ax, arrays)
    map(I) do ι
        x = filter(!isnan, [a[ι] for a in arrays])
        f(x)
    end
end

elementwise_ignoreNaN(mean, As) # [1 1; 2 3]

but I wonder if this functionality exists already in some package, or if I could do this better.

tkluck · December 18, 2019, 10:12am

Any reason not to use broadcasting? E.g. something similar to

julia> nanless(f, a...) = f(filter(!isnan, a))
nanless (generic function with 1 method)

julia> nanless.(mean, A1, A2)
2×2 Array{Float64,2}:
 1.0  1.0
 2.0  3.0

Tamas_Papp · December 18, 2019, 10:20am

I have about 5000 such arrays (sorry, should have said), and I am not sure that broadcasting is the good match for that.

nilshg · December 18, 2019, 10:52am

Could skipmissing help if you use missing instead of NaN? For some reason it doesn’t seem to work on your MWE but it promises to be the efficient way to filter missing values in calculations.

tkf · December 18, 2019, 11:45pm

Maybe you are asking for a package that does this out-of-the-box, but I think using LazyArrays.@~ to create Broadcasted objects is a neat approach:

xs = [[NaN 1; 2 3], [1 NaN; 2 3]]

add!!(a, b) = a .+ b
add!!(a::AbstractArray, b) = a .+= b
sumpw(f, xs) = mapreduce(f, add!!, xs)

using LazyArrays: @~
fillnan(x) = isnan(x) ? zero(x) : x
sumpw(x -> (@~ fillnan.(x)), xs) ./
    sumpw(x -> (@~ .!isnan.(x)), xs)

With this approach, you don’t have to allocate intermediate arrays like fillnan.(x). The array allocated by materializing fillnan.(xs[1]) +. fillnan.(xs[2]) is used for a entire base case of the reduction. You can also use mapfoldl instead of mapreduce to minimize the allocation.

(I’ll probably add add!! to BangBang.jl so that above code would be more generic.)

This is also easily parallelizable by using Transducers; sumpw(f, xs) = reduce(add!!, Map(f), xs).

tbeason · December 19, 2019, 7:32pm

Have you had a look at NaNMath.jl?

Tamas_Papp · December 20, 2019, 9:27am

Thanks for the suggestion - NaNMath.jl should simplify the filter/reduce step. I still need a neat way to map a bunch of arrays elementwise though.

I don’t think it is appropriate for my use case, since the objects being skipped are elements of arrays in a vector, not elements of a vector.

It is indeed neat (once I understand it ), but conceptually isn’t it just

As = [[NaN 1; 2 3], [1 NaN; 2 3]]
sum(A -> map(x -> isnan(x) ? zero(x) : x, A), As) ./ sum(A -> map(!isnan, A), As)

tkf · December 20, 2019, 9:43am

That’s right. The reason why I thought it was neat was that @~-based implementation allocates arrays only when necessary; i.e., thanks to the clever design of the broadcasting machinery, fillnan.(x) and a .+ b are fused even though those expressions are not syntactically in the same “dot call expression.”

Topic		Replies	Views
Ignoring NaNs when calculating means of columns of a dataframe General Usage dataframes , nan	1	399	February 17, 2024
NA-ignoring aggregations General Usage	15	5085	January 5, 2019
Nanmean options? General Usage	17	8869	February 11, 2024
Ignoring `NaN` with `findmax()` General Usage nan	9	3829	May 25, 2020
Compute mean of array where all values could be missing New to Julia	5	381	April 21, 2021

Ignoring NaN in elementwise aggregations

Related topics