Ignoring NaN in elementwise aggregations

I would like to do calculations like

mean(a_bunch_of_similar_arrays)

but with NaNs quietly ignored.

MWE with an inefficient and inelegant solution:

A1 = [NaN 1; 2 3]

A2 = [1 NaN; 2 3]

As = [A1, A2]

function elementwise_ignoreNaN(f, arrays)
    ax = axes(first(arrays))
    I = CartesianIndices(ax)
    @assert all(a -> axes(a) == ax, arrays)
    map(I) do ι
        x = filter(!isnan, [a[ι] for a in arrays])
        f(x)
    end
end

elementwise_ignoreNaN(mean, As) # [1 1; 2 3]

but I wonder if this functionality exists already in some package, or if I could do this better.

2 Likes

Any reason not to use broadcasting? E.g. something similar to

julia> nanless(f, a...) = f(filter(!isnan, a))
nanless (generic function with 1 method)

julia> nanless.(mean, A1, A2)
2×2 Array{Float64,2}:
 1.0  1.0
 2.0  3.0

I have about 5000 such arrays (sorry, should have said), and I am not sure that broadcasting is the good match for that.

Could skipmissing help if you use missing instead of NaN? For some reason it doesn’t seem to work on your MWE but it promises to be the efficient way to filter missing values in calculations.

Maybe you are asking for a package that does this out-of-the-box, but I think using LazyArrays.@~ to create Broadcasted objects is a neat approach:

xs = [[NaN 1; 2 3], [1 NaN; 2 3]]

add!!(a, b) = a .+ b
add!!(a::AbstractArray, b) = a .+= b
sumpw(f, xs) = mapreduce(f, add!!, xs)

using LazyArrays: @~
fillnan(x) = isnan(x) ? zero(x) : x
sumpw(x -> (@~ fillnan.(x)), xs) ./
    sumpw(x -> (@~ .!isnan.(x)), xs)

With this approach, you don’t have to allocate intermediate arrays like fillnan.(x). The array allocated by materializing fillnan.(xs[1]) +. fillnan.(xs[2]) is used for a entire base case of the reduction. You can also use mapfoldl instead of mapreduce to minimize the allocation.

(I’ll probably add add!! to BangBang.jl so that above code would be more generic.)

This is also easily parallelizable by using Transducers; sumpw(f, xs) = reduce(add!!, Map(f), xs).

2 Likes

Have you had a look at NaNMath.jl?

2 Likes

Thanks for the suggestion - NaNMath.jl should simplify the filter/reduce step. I still need a neat way to map a bunch of arrays elementwise though.

I don’t think it is appropriate for my use case, since the objects being skipped are elements of arrays in a vector, not elements of a vector.

It is indeed neat (once I understand it :wink:), but conceptually isn’t it just

As = [[NaN 1; 2 3], [1 NaN; 2 3]]
sum(A -> map(x -> isnan(x) ? zero(x) : x, A), As) ./ sum(A -> map(!isnan, A), As)

That’s right. The reason why I thought it was neat was that @~-based implementation allocates arrays only when necessary; i.e., thanks to the clever design of the broadcasting machinery, fillnan.(x) and a .+ b are fused even though those expressions are not syntactically in the same “dot call expression.”

1 Like