Hi,
OnlineSampleStatistics.jl is a package for online single pass estimation of any statistical moments of data stream. OnlineStats.jl package was not adapted to my needs (mainly camera calibration) for two reasons that are solved in this package:
- working with arrays: fast estimation mean, variance and even higher moments images from a stream of data frames,
- numerically stable estimation for non centered variables: as illustrated in the README,
OnlineStatssuffers from catastrophic precision issues on non-centered data (that can even lead to negative variance).
OnlineSampleStatistics.jl implements two types: UnivariateStatistic for single variable and IndependentStatistic as arrays of UnivariateStatistic.
UnivariateStatisticis a subtype ofOnlineStats{T}from OnlineStats.jl to leverage its functionality, including the Transducers.jl methods for parallel processing.IndependentStatisticis a subtype ofAbstractArray{UnivariateStatistic{T,K},N}that use aZippedArraysto ensure the efficiency of the updates.
Both are fast (UnivariateStatistic is as fast as Variance() from OnlineStats), zero allocations and supports weighted data.
It is not required for my use case, but I know how to adapt it to GPUArray and Unitfull variables.
For those interested, the package is already available in the general registry. It is my first package and feedback are welcomed.