Check out WeightedOnlineStats.jl
(GitHub - gdkrmr/WeightedOnlineStats.jl: Weighted version of OnlineStats.jl) if you like OnlineStats.jl
but were missing proper statistical weighting.
This looks great, thank you for your work!
Out of curiosity, why a new package instead of PRs to OnlineStats.jl
?
OnlineStats.jl
has a very different concept of weights, so we felt it was better to do this in a separate package.
Why is this not using StatsBase.Weights
?
StatsBase.Weights
are actual vectors that store all the weights, which is not what we want for something like OnlineStats
. There is also no need to fix the type of weights when creating the object. If you have an idea how StatsBase.Weights
can give more value to WeightedOnlineStats
, I am all ears.
We could use multiple dispatch and do something like this
o = fit!(WeightedVariance(), x, w)
var(o, AnalyticWeights)
instead of
var(o, corrected = true, weight_type = :analytic)
but this is still very different from what StatsBase
does:
var(x, aweights(w))
StatsBase.Weights
are a light wrapper around an AbstractVector
(no data is copied, just a reference and the sum). You could do something like StatsBase.Weights(FillArrays.Ones(x))
which is basically a close to zero cost operation.
Easier to use the flexible proven API rather than have a different one. At least for statistical modeling, StatsBase weights abstraction comes in very handy. It should also lead to less code duplication as the functions are already implemented with the weighted generalization flavor.
How would you suggest doing that?
o = WeightedMean()
fit!(o, x, aweights(w))
what if I do
fit!(o, x2, fweights(w2))
then on the same o
?
Between,
o = WeightedMean()
fit!(o, x, aweights(w))
and
fit!(o, x2, fweights(w2))
Not sure about what fit!
does in this case, but if needs an iterable, a function, and weights,
fit(x, ::Function, ::AbstractWeights = FrequencyWeights(Ones(x)))
I am very much agains the verbose syntax and the extra dependency. Also it does not make much sense to include this in the fit!
function or the WeightedOnlineStat
because there is no added value (maybe except for not keeping track of sum(w.^2)
).
The best way I see this could be included would be by something like:
var(o::WeightedVar, ::Type{AnalyticWeights})
Verbose syntax?
Compare,
fit(x, mean)
or
wts = aweights(w)
fit(x, mean, wts)
to
o = WeightedMean()
wts = aweights(w)
fit!(o, x, wts)
About the extra dependency, OnlineStats which is a dependency of WeightedOnlineStats already has StatsBase as a dependency so there is no extra dependency involved.
This is not how OnlineStats
works.
Just giving suggestions, take what you think is good. Just do make the case clear since the dependency or verbosity arguments aren’t that compelling. There maybe be other valid reasons.
FYI: here is the Github issue discussing this.