I often find myself in a need of aggregation functions which skip NA-like values, e.g. NaN
or missing
or nothing
. There doesn’t seem to be a package yet providing even basic functions like sum, mean, median etc, so I rolled out a very simple wrapper myself, which can skip any specified value. The whole implementation is just a few lines of code:
using JuliennedArrays: julienne
agg(arr, func, dimscode; skip=undef) = _agg(arr, func, dimscode, skip)
_agg(arr, func, dimscode, skip::UndefInitializer) = map(func, julienne(arr, dimscode))
_agg(arr, func, dimscode, skip::Function) = map(a -> func(Iterators.filter(!skip, a)), julienne(arr, dimscode))
_agg(arr, func, dimscode, skip) = _agg(arr, func, dimscode, x -> x == skip)
and it works reasonably fast:
arr = rand(Float64, 100, 100, 100)
@btime agg(arr, mean, (:, *, *), skip=0.0);
@btime agg(arr, mean, (:, *, *));
@btime mean(arr, dims=1);
outputs
1.317 ms (20007 allocations: 1015.83 KiB)
352.940 μs (10004 allocations: 703.27 KiB)
248.873 μs (3 allocations: 78.22 KiB)
One of the things I don’t like here is that I cannot specify aggregation axes by indices (dims=1
above) or names in case of AxisArray
. Any reasonably simple way to implement this?
Another question - does it look general and useful enough to be included into some package?