Hi, OnlineSampleStatistics.jl is a package for online single pass estimation of any statistical moments of data stream. OnlineStats.jl package was not adapted to my needs (mainly camera calibration) for two reasons that are solved in this package:
working with arrays: fast estimation mean, variance and even higher moments images from a stream of data frames,
numerically stable estimation for non centered variables: as illustrated in the README, OnlineStats suffers from catastrophic precision issues on non-centered data (that can even lead to negative variance).
OnlineSampleStatistics.jl implements two types: UnivariateStatistic for single variable and IndependentStatistic as arrays of UnivariateStatistic.
UnivariateStatistic is a subtype of OnlineStats{T} from OnlineStats.jl to leverage its functionality, including the Transducers.jl methods for parallel processing.
IndependentStatistic is a subtype of AbstractArray{UnivariateStatistic{T,K},N} that use a ZippedArrays to ensure the efficiency of the updates.
Both are fast (UnivariateStatistic is as fast as Variance() from OnlineStats), zero allocations and supports weighted data.
It is not required for my use case, but I know how to adapt it to GPUArray and Unitfull variables.
For those interested, the package is already available in the general registry. It is my first package and feedback are welcomed.
For astronomical optical camera, the calibration step of each observation requires to compute the so-called Dark (shutters closer) , Flats (image of a flatfield), Sky (image of part of the sky without objects) images and potentially many other kind of calibration images to estimate that readout noise, the dark current and the gain and other pixel wise parameters. As it requires precise estimation of mean and variance maps for each, we accumulate hundreds or thousands of individual megapixels frames. Higher moments can be also useful to detect bad pixels that are numerous on infrared camera
Not to be that guy, but from your post it seems like this package could have been a couple of PRs to OnlineStats? What necessitated an entire new package?
For the precision issue and the estimation of moments higher than 4, for sure it could be (and still can be) a PR to OnlineStats and as UnivariateStatistic implements the OnlineStatsBase API, I believe it is probably sufficient.
But for my main purpose (Arrays), the Group object of OnlineStats does not seems adapted for 2kx2k pixels arrays. In addition I would like IndependentStatistic to be a subtype of AbstractArray{UnivariateStatistic} to use the powerful ZippedArray package and I didn’t find an easy implementation OnlineStatsBase API.