Randomly generating an array that satisfies a condition

Hi everyone,

Say I have 3 numbers - x,y,z, that must sum to 1. Is there a way to generate a random combination of these 3 numbers that respects this condition?

This problem is trivial if it’s 2-dimensional since you can individually generate x using the rand() function and y would consequently be 1-x, but that method wouldn’t work for 3 dimensions and above.

Thank you!

1 Like

Maybe it isn’t the fastest way, but:

v = rand(3)
v ./= sum(v)
x,y,z = v

works, i.e. normalize by dividing by sum a collection of n uniform [0,1] randoms.

Nicer looking version of same calculation:

using LinearAlgebra

x,y,z = normalize!(rand(3), 1) # use 1-norm
2 Likes

Vectors are nice when you want to generate N numbers. But if you specifically want three, you could try with a tuple.

t = ntuple(_->rand(), 3)
x, y, z = t ./ sum(t)

Another interesting option is to use the Dirichlet Distribution:

using Distributions

dimension = 3
α = 1

V = Dirichlet(dimension , α) 

rand(V)

This might be nice if you need to calculate a lot of these combinations because you can do stuff like rand(V,100).

5 Likes

And another option:

diff(begin v = sort!(rand(3)); @inbounds push!(v, 1+v[1]); end)

isn’t so fast, but manifests idea of choosing points on circle of unit circumference and using intervals as the random values.

SUMMARY: DNF’s solution is fastest (least allocations) if dimension is a compile-time fixed small number.

Be carrefull about the density you want. The Dirichlet is not the same as normalizing a uniform vector.

In fact, this Dirichlet is the uniform distribution over the simplex, While normalizing a vector of rand() is not (if I recall correctly).

You can have the same result (uniformity over the simplex) by sampling like that :

u = .-log.(rand(3))
u ./= sum(u)

The negative sign is coming from the equations and may be removed.

5 Likes

Ah very true, transforming random numbers can be very tricky. I generated 100,000 random triplets for each method and plotted their distributions:

dir

div_sum

log

diff

Taking a uniform set of numbers and dividing by their sum appears to bias the numbers toward the mean of 1/3. The diff() method suggested by @Dan also produces a different distribution. Very interesting.

4 Likes

A better visualization would be to plot the bivariate density of (x.y) (ommiting z) as a surface : it should be the 3d simplex surface for uniformity.

After thinking about it, the univariate densities are not enough to conclude on the distribution of the triplet since you still lack the dependence structure. So you NEED to check bivariate uniformity. One bivariate density is enough since there are only two degree of freedom.

1 Like

The 3 variables cannot be independent, as 2 determine the third. Additionally, they are negatively correlated (as a big value for x forces small values for y,z). The Dirichlet distribution may be uniform on the simplex of distributions, but the marginals will not be uniform in this case (as the method using log shows).
The Dirichlet with other parameters can allow concentrating the density on the corners or the center of the simplex (and even making biased distributions on the simplex).
So, I guess, the Dirichlet (or log method) is the ‘simplest’ choice (and thus preferable in some Occam’s razor sense).

If you are psychologically satisfied with this pattern, you could “naturally” extend it in the following manner.
If in case x,y do x=rand() and y=1-x.
In the x,y,z case you could do x=rand(), y=rand()*(1-x) and z=(1-y)*(1-x).
This can be extended to any size…
to implement it you could use the accumulate function…

julia> p=accumulate((s,c)->((s[1]-s[2]), rand()*(s[1]-s[2])), 1:2,init=(1,rand()))
2-element Vector{Tuple{Float64, Float64}}:
 (0.7296027958824168, 0.590858165041365)
 (0.13874463084105182, 0.013489182506681588)

julia> x,y=(last.(p)...,)
(0.590858165041365, 0.013489182506681588)

julia> z=1-sum(last.(p))
0.39565265245195336

julia> x+y+z
1.0

The geometric representation of the unit simplex is the triangle ABC in the 3d space,
of vertices A(1,0,0), B(0,1,0), C(0,0,1).
Here is the result of uniform sampling from this simplex, using the variates from Exp(1), suggested by lrnv (it is a method thoroughly argumented in the Devroye’s book, page 207):
samplingsimplex-exp
Similar sampling from Dirichlet, proposed by RobertGregg.
But the method of normalizing 3-uniform vectors is far from being uniform:
samplingsimplex-normuvect

2 Likes

You seem to be referring to three plots, is one missing?

That is, it looks the same as the top one.

1 Like

Yes, because the plane of eqn x1+x2+x3=1 intetersects the axes Ox1, Ox2, Ox3 at A, B,C.