# Do I need to specify what kind of weights I am providing?

I found helpful instructions on using `weights` and its source code (here, here, here) but I am still not sure I understand how to use it correctly. Maybe I am just confusing/misunderstanding something basic here…

My question: Do I need to specify what kind of weights I am supplying? If no, how does `weights` infer if I am providing frequency, analytic, or probability weights?

I am wondering about this because I need to transfrom frequency weights included in a dataset. A few of these transformed weights are less than 1. Hence, the distinction between frequency and probability or analytic weights is no longer obvious.

As a result, I am not sure if differences in weighted outcomes are due to my transformation of the weights or because `weights` assumes that the transformed weights are not of the same kind than the original ones (or both). See example below.

Many thanks for clarifying and explaining.

using Distributions, Random, StatsBase

rng = MersenneTwister(1234);
N = 1000;

a_dist = Normal(1000, 100); a = rand(rng, a_dist, N); # vector with obs

weights_frequency_orig = rand(rng, 1:100, N); # vector with ‘original’ freq weights
weights_frequency_transformed = rand(rng, 0.1:0.1:100.0, N); # transformed freq weights

a_std_weight_frequency_orig = std(a, weights(weights_frequency_orig));
a_std_weight_frequency_transformed = std(a, weights(weights_frequency_transformed));

a_std_weight_frequency_orig == a_std_weight_frequency_transformed # false

Is this because `weights_frequency_orig != weights_frequency_transformed` or because `weights` assumes different weights?

Probably a misunderstanding, you should explicitly use `FrequencyWeights` or `fweights`, `ProbabilityWeights` or `pweights` or `AnalyticWeigths` or `aweights`. Just `weights` will not make any assumption about the type and just error out if you try to do anything that needs this information (see your third link a bit down)

1 Like

Am I missing something, or is the fact that you call `rand()` twice (generating independent weights for the original and “transformed” cases) the culprit here? What happens if you replace `==` in the last line with `≈`?

Notice that setting `rng` fixes the random state for the first trial but it keeps updating after that:

``````julia> rng = MersenneTwister(1234);

julia> rand(rng)
0.5908446386657102

julia> rand(rng)
0.7667970365022592

julia> rand(rng)
0.5662374165061859
``````

Thanks! Makes sense.

So `weights` does not make any inference about the type of weights and just forwards responses of functions in which a given type of weights leads to errors?

Thanks for looking into this - I realize now my illustrative example is confusing.

The weights are supposed to differ by construction: `weights_frequency_transformed ` are [0.1,100] while `weights_frequency_orig` are [1,100].

But I see your point that calling `rng` repeatedly brings in additional differences. I am not sure this makes a big difference since N = 1000 but it’s definitely confusing.

Several things going on here:

• `std()` is in Statistics, not StatsBase
• Generating the weights randomly ensures that the outputs won’t be exactly the same
• `0.1:0.1:100.0` should have an upper limit of `10.0` instead to be equivalent to `1:100`
• 1000 is not enough test points for the stdevs to come out close

Here is a corrected example with random weights:

``````using Random, Statistics   # std() is in Statistics, not StatsBase

N = 100000                       # More points
a = 100 .* 1000 .+ randn(N)      # Equivalent to Normal(1000, 100)

weights_frequency_orig = rand(1:100, N)
weights_frequency_transformed = rand(0.1:0.1:10.0, N)  # Upper limit corrected to 10.0

a_std_weight_frequency_orig = std(a, weights(weights_frequency_orig));
a_std_weight_frequency_transformed = std(a, weights(weights_frequency_transformed));

@show a_std_weight_frequency_orig
@show a_std_weight_frequency_transformed
# a_std_weight_frequency_orig = 0.9992338257618083
# a_std_weight_frequency_transformed = 0.9968675255497309
``````

And one where I actually transform the original weights instead of generating new ones:

``````using Random, Statistics

N = 1000
a = 100 .* 1000 .+ randn(N)

weights_frequency_orig = rand(1:100, N)
weights_frequency_transformed = weights_frequency_orig ./ 10  # Transformed instead

a_std_weight_frequency_orig = std(a, weights(weights_frequency_orig));
a_std_weight_frequency_transformed = std(a, weights(weights_frequency_transformed));

@show a_std_weight_frequency_orig
@show a_std_weight_frequency_transformed
# a_std_weight_frequency_orig = 1.00674757512939
# a_std_weight_frequency_transformed = 1.0067475751293908
``````

Notice that the result matches up to floating-point error.

To answer your original question, it isn’t clear to me which type of weights is assumed by the generic `weights()` function. For a side-by-side comparison like this, rather than hoping that `weights()` will figure out what you intended, it’d be better to use

``````a_std_weight_frequency_orig = std(a, ProbabilityWeights(weights_frequency_orig))
a_std_weight_frequency_transformed = std(a, ProbabilityWeights(weights_frequency_transformed))
``````

or similar to remove the ambiguity, where the different weight constructors are as listed here.