I noticed that when provided with zero probabilities, sample (from StatsBase) always throws the first element. One can see it in the following example:
using StatsBase
println(sample([2, 3, 1], ProbabilityWeights([0.0, 0.0, 0.0]), 20))
Is that a desired behavior and if “yes”, why? Thanks!
This behavior does seem odd to me. From what I can tell, the weights are normalized to probabilities by dividing each weight by the sum of weights. If that is true, it seems like the weights would be undefined when they sum to zero. Here is a case where they sum to zero, and the third element is selected. It seems like it selects the element with the highest weight, or the first element in the case of a tie.
using StatsBase
println(sample([2, 3, 1], ProbabilityWeights([-1.0, -1.0, 2.0]), 200))
It would be good to explain this behavior in the documentation if it is intended.
1 Like