StatsBase sample throws first element when probabilities are zero

I noticed that when provided with zero probabilities, sample (from StatsBase) always throws the first element. One can see it in the following example:

using StatsBase

println(sample([2, 3, 1], ProbabilityWeights([0.0, 0.0, 0.0]), 20))

Is that a desired behavior and if “yes”, why? Thanks!

This behavior does seem odd to me. From what I can tell, the weights are normalized to probabilities by dividing each weight by the sum of weights. If that is true, it seems like the weights would be undefined when they sum to zero. Here is a case where they sum to zero, and the third element is selected. It seems like it selects the element with the highest weight, or the first element in the case of a tie.

using StatsBase

println(sample([2, 3, 1], ProbabilityWeights([-1.0, -1.0, 2.0]), 200))

It would be good to explain this behavior in the documentation if it is intended.

1 Like

Good catch. We had discussed this a long time ago and @bkamins fixed one method, but we had dropped the ball on fixing other methods. See Error when negative weights or zero sum are used when sampling by nalimilan · Pull Request #834 · JuliaStats/StatsBase.jl · GitHub.

2 Likes