[ANN] WeightedOnlineStats.jl

Check out WeightedOnlineStats.jl (https://github.com/gdkrmr/WeightedOnlineStats.jl) if you like OnlineStats.jl but were missing proper statistical weighting.


This looks great, thank you for your work!
Out of curiosity, why a new package instead of PRs to OnlineStats.jl?

1 Like

OnlineStats.jl has a very different concept of weights, so we felt it was better to do this in a separate package.


Why is this not using StatsBase.Weights?


StatsBase.Weights are actual vectors that store all the weights, which is not what we want for something like OnlineStats. There is also no need to fix the type of weights when creating the object. If you have an idea how StatsBase.Weights can give more value to WeightedOnlineStats, I am all ears.

We could use multiple dispatch and do something like this

o = fit!(WeightedVariance(), x, w)
var(o, AnalyticWeights)

instead of

var(o, corrected = true, weight_type = :analytic)

but this is still very different from what StatsBase does:

var(x, aweights(w))
1 Like

StatsBase.Weights are a light wrapper around an AbstractVector (no data is copied, just a reference and the sum). You could do something like StatsBase.Weights(FillArrays.Ones(x)) which is basically a close to zero cost operation.

Easier to use the flexible proven API rather than have a different one. At least for statistical modeling, StatsBase weights abstraction comes in very handy. It should also lead to less code duplication as the functions are already implemented with the weighted generalization flavor.


How would you suggest doing that?

o = WeightedMean()
fit!(o, x, aweights(w))

what if I do

fit!(o, x2, fweights(w2))

then on the same o?


o = WeightedMean()
fit!(o, x, aweights(w))


fit!(o, x2, fweights(w2))

Not sure about what fit! does in this case, but if needs an iterable, a function, and weights,

fit(x, ::Function, ::AbstractWeights = FrequencyWeights(Ones(x)))

I am very much agains the verbose syntax and the extra dependency. Also it does not make much sense to include this in the fit! function or the WeightedOnlineStat because there is no added value (maybe except for not keeping track of sum(w.^2)).

The best way I see this could be included would be by something like:

var(o::WeightedVar, ::Type{AnalyticWeights}) 

Verbose syntax?

fit(x, mean)


wts = aweights(w)
fit(x, mean, wts)


o = WeightedMean()
wts = aweights(w)
fit!(o, x, wts)

About the extra dependency, OnlineStats which is a dependency of WeightedOnlineStats already has StatsBase as a dependency so there is no extra dependency involved.


This is not how OnlineStats works.

Just giving suggestions, take what you think is good. Just do make the case clear since the dependency or verbosity arguments aren’t that compelling. There maybe be other valid reasons.


FYI: here is the Github issue discussing this.