Randomly generating an array that satisfies a condition

Longblackcoffee · May 9, 2023, 10:00pm

Hi everyone,

Say I have 3 numbers - x,y,z, that must sum to 1. Is there a way to generate a random combination of these 3 numbers that respects this condition?

This problem is trivial if it’s 2-dimensional since you can individually generate x using the rand() function and y would consequently be 1-x, but that method wouldn’t work for 3 dimensions and above.

Thank you!

Dan · May 9, 2023, 10:15pm

Maybe it isn’t the fastest way, but:

v = rand(3)
v ./= sum(v)
x,y,z = v

works, i.e. normalize by dividing by sum a collection of n uniform [0,1] randoms.

Nicer looking version of same calculation:

using LinearAlgebra

x,y,z = normalize!(rand(3), 1) # use 1-norm

DNF · May 9, 2023, 10:42pm

Vectors are nice when you want to generate N numbers. But if you specifically want three, you could try with a tuple.

t = ntuple(_->rand(), 3)
x, y, z = t ./ sum(t)

RobertGregg · May 9, 2023, 10:57pm

Another interesting option is to use the Dirichlet Distribution:

using Distributions

dimension = 3
α = 1

V = Dirichlet(dimension , α) 

rand(V)

This might be nice if you need to calculate a lot of these combinations because you can do stuff like rand(V,100).

Dan · May 9, 2023, 11:02pm

And another option:

diff(begin v = sort!(rand(3)); @inbounds push!(v, 1+v[1]); end)

isn’t so fast, but manifests idea of choosing points on circle of unit circumference and using intervals as the random values.

SUMMARY: DNF’s solution is fastest (least allocations) if dimension is a compile-time fixed small number.

lrnv · May 9, 2023, 11:30pm

Be carrefull about the density you want. The Dirichlet is not the same as normalizing a uniform vector.

In fact, this Dirichlet is the uniform distribution over the simplex, While normalizing a vector of rand() is not (if I recall correctly).

You can have the same result (uniformity over the simplex) by sampling like that :

u = .-log.(rand(3))
u ./= sum(u)

The negative sign is coming from the equations and may be removed.

RobertGregg · May 10, 2023, 12:39am

Ah very true, transforming random numbers can be very tricky. I generated 100,000 random triplets for each method and plotted their distributions:

dir

div_sum

log

diff

Taking a uniform set of numbers and dividing by their sum appears to bias the numbers toward the mean of 1/3. The diff() method suggested by @Dan also produces a different distribution. Very interesting.

lrnv · May 10, 2023, 1:17am

A better visualization would be to plot the bivariate density of (x.y) (ommiting z) as a surface : it should be the 3d simplex surface for uniformity.

After thinking about it, the univariate densities are not enough to conclude on the distribution of the triplet since you still lack the dependence structure. So you NEED to check bivariate uniformity. One bivariate density is enough since there are only two degree of freedom.

Dan · May 10, 2023, 2:08am

The 3 variables cannot be independent, as 2 determine the third. Additionally, they are negatively correlated (as a big value for x forces small values for y,z). The Dirichlet distribution may be uniform on the simplex of distributions, but the marginals will not be uniform in this case (as the method using log shows).
The Dirichlet with other parameters can allow concentrating the density on the corners or the center of the simplex (and even making biased distributions on the simplex).
So, I guess, the Dirichlet (or log method) is the ‘simplest’ choice (and thus preferable in some Occam’s razor sense).

sprmnt21 · May 10, 2023, 5:12am

If you are psychologically satisfied with this pattern, you could “naturally” extend it in the following manner.
If in case x,y do x=rand() and y=1-x.
In the x,y,z case you could do x=rand(), y=rand()*(1-x) and z=(1-y)*(1-x).
This can be extended to any size…
to implement it you could use the accumulate function…

julia> p=accumulate((s,c)->((s[1]-s[2]), rand()*(s[1]-s[2])), 1:2,init=(1,rand()))
2-element Vector{Tuple{Float64, Float64}}:
 (0.7296027958824168, 0.590858165041365)
 (0.13874463084105182, 0.013489182506681588)

julia> x,y=(last.(p)...,)
(0.590858165041365, 0.013489182506681588)

julia> z=1-sum(last.(p))
0.39565265245195336

julia> x+y+z
1.0

empet · May 10, 2023, 7:06am

The geometric representation of the unit simplex is the triangle ABC in the 3d space,
of vertices A(1,0,0), B(0,1,0), C(0,0,1).
Here is the result of uniform sampling from this simplex, using the variates from Exp(1), suggested by lrnv (it is a method thoroughly argumented in the Devroye’s book, page 207):
samplingsimplex-exp
Similar sampling from Dirichlet, proposed by RobertGregg.
But the method of normalizing 3-uniform vectors is far from being uniform:
samplingsimplex-normuvect

DNF · May 10, 2023, 7:16am

You seem to be referring to three plots, is one missing?

jd-foster · May 10, 2023, 7:54am

That is, it looks the same as the top one.

empet · May 10, 2023, 8:04am

Yes, because the plane of eqn x1+x2+x3=1 intetersects the axes Ox1, Ox2, Ox3 at A, B,C.

Topic		Replies	Views
How to create random values given a restriction? New to Julia question	3	1223	September 20, 2019
Is there a quicker way to obtain normalized multivariate random normal draws? Performance statistics , speed-optimization	15	390	May 18, 2024
How do I create an array with random unique numbers in a specific range? New to Julia question , random	20	1220	June 24, 2024
Efficient binary vector New to Julia question	33	3589	July 1, 2018
Creating an array/matrix with a specified range of randomly generated numbers New to Julia	5	20983	September 25, 2018

Randomly generating an array that satisfies a condition

Related topics