One-pass NaN variance

I’d like to try and use the implementation of Welford’s algorithm in Base.Statistics, but filter out NaNs along the way.

First thing I tried is the following:

Statistics.var(Iterators.filter(!isnan, vals))

However, this runs about 20x slower than without the Iterators.filter, making a no-go.

Why does the filter have such an overhead?

Without going through the details of _var, I’d guess that by filtering values one-by-one, you prevent vectorization. Using transducers may help a little, but I don’t think you can eliminate the main slowdown unless you filter the array first.

I don’t see 20x slowdown, but maybe that’s possible with AVX512 hardware.

julia> x = rand(10^6);

julia> x[x.<0.5] .= NaN;

julia> @btime var($x)
  832.718 μs (0 allocations: 0 bytes)
NaN

julia> @btime var(Iterators.filter(!isnan, $x))
  4.725 ms (0 allocations: 0 bytes)
0.020807221377812737

julia> @btime var(filter(!isnan, $x))
  2.342 ms (3 allocations: 7.63 MiB)
0.020807221377812827

julia> @btime var(filter!(!isnan, y)) setup=(y=copy(x)) evals=1
  1.732 ms (2 allocations: 16 bytes)
0.020833079313851994

julia> versioninfo()
Julia Version 1.6.0-DEV.464
Commit 29826c2c08 (2020-07-14 22:58 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: AMD Ryzen 7 4700U with Radeon Graphics
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, btver1)
Environment:
  JULIA_NUM_THREADS = 8

One thing to try would be OnlineStats https://github.com/joshday/OnlineStats.jl

2 Likes

A nice benefit of OnlineStats.jl is the ThreadsX.jl interface. In my original example, I’m able to beat the unfiltered single-threaded performance:

julia> @btime fit!(FTSeries(Variance(); filter=!isnan), $x)
  5.106 ms (3 allocations: 80 bytes)
FTSeries
└─ Variance: n=500317 | value=0.0208331


julia> @btime ThreadsX.reduce(FTSeries(Variance(); filter=!isnan), $x)
  716.292 μs (3041 allocations: 174.73 KiB)
FTSeries
└─ Variance: n=500317 | value=0.0208331
3 Likes