Generate random value from a given function, out of box

misha_mikhasenko · September 8, 2017, 10:27pm

Hi,

I am looking for a simple way to generate a random number according to a given distribution.
The function is not easy to invert and I do not want to do it. I will be happy to find a numerical method.

It could digitize the function at some discrete points and make the inversion numerically, as it is done in

I am working in the hadron spectroscopy, that is sometimes about large datasets and a lot of MonteCarlo.
The common tool is ROOT with C++ https://root.cern.ch/, but it is C++.
I also use Mathematica, but it is slow and commercial.
Julia might be a good option in between.

As an example, I want:
3 variables (will be colums, say A B C), 10e7 rows for random values distributed according to some known functions.
A ~ sqrt(1-1/a), in the interval 1:5,
B and C are correlated: B x C ~ sqrt(b)*(b+c)^2, b in 2:6, c in 0:10

Btw, would you suggest to use DataFrames of work with a nested array?
Is there a way to apply function on columns over rows of the matrix? Something like

map(x->x[1]+x[2], myMatrix)

Thanks

JackDevine · September 8, 2017, 11:35pm

I think that you are looking for Distributions.jl. It has quite a few distributions to choose from and you can use it to make your own distribution which you can sample from.

I am not sure that I understand the question, but . syntax lets you broadcast over arrays e.g.

f.(myMatrix)

applies f elementwise.

On the other hand if you have a function f that works on 1d arrays and you want to apply it to the columns of a matrix, then you could use a list comprehension:

julia> myMatrix = rand(2,2)
2×2 Array{Float64,2}:
 0.283274  0.95737   
 0.492706  0.00804106

julia> f(x) = x[1] + x[2]
f (generic function with 1 method)

julia> [f(myMatrix[:, i]) for i in 1:2]
2-element Array{Float64,1}:
 0.916368
 0.475755

misha_mikhasenko · September 9, 2017, 12:25pm

Thank you, Jack.

I looked at the Distributions.jl package. While it is great and has a lot of functionality it does not seems to be right to use for my problem. It is rand method which has to be implemented if I understood correctly,

https://juliastats.github.io/Distributions.jl/latest/extends.html

I am sure, trying to implement all possible PDFs is hopeless. The Distributions.jl-people should have thought about a numerical procedure for the arbitrary function. I just can not find.

You are right about the matrix! Since I am new in julia, it will take some to used to it.
Perhaps, the question array vs matrix is about the syntactic sugar. A selection (or filtering) looks easier for a nested array, while it is a bit more work to convert it to the plotting format (an array for x, an array for y). Finally, it is a real problem.

jonathanBieler · September 9, 2017, 2:14pm

You can use StatsBase.sample with a Weights vector, e.g.

using StatsBase, Gadfly

x = linspace(0,π,100)
P = sin(x)
P = P/sum(P)

r = StatsBase.sample(1:100, Weights(P),10000)
plot(x=x[r],Geom.histogram)

http://juliastats.github.io/StatsBase.jl/stable/sampling.html#Sampling-API-1

misha_mikhasenko · September 9, 2017, 2:56pm

Jonathan,

thank you. Great, that is almost what I need!
My variable is still discrete, while it is important to keep it continuous.
Then, PDF is step-like or even better linearly interpolated.

stevengj · September 9, 2017, 9:53pm

mapslices

misha_mikhasenko · September 19, 2017, 8:01am

I wrote a solution for the problem based on StatsBase.sample following Jonathan suggestion.
It would be great if something like this exists in StatsBase.

Please let me know if the same can be done better, easier.

using StatsBase

function invertLinear(t,xmin,xmax,ymin,ymax)
    # special case, if Pol1
    ymin == ymax && return xmin + t*(xmax-xmin)
    # otherwise, Pol2
    A = (ymax-ymin)/(xmax-xmin)/2;
    B = ymin;
    C = - t*(ymin*(xmax-xmin)+(xmax-xmin)*(ymax-ymin)/2);
    x = xmin + (-B + sqrt(B^2-4*A*C))/(2*A)
    return x
end

function getRandom(func, xmin, xmax, Nvalues, Ndiv)
    x = linspace(xmin,xmax,Ndiv)
    P = func.(x)
    # remove interval where the function is negative
    P = [v<0 ? 0 : v for v in P]    
    # weight is given by integral
    Pa = [(P[i]+P[i+1])/2 for i in 1:Ndiv-1]
    Pa = Pa/sum(Pa)
    # generate set on random bin indexes
    inds = StatsBase.sample(1:Ndiv-1, Weights(Pa), Nvalues)
    # convert the set of indexes to random variables inside a bin
    return [invertLinear(rand(),x[i],x[i+1],P[i],P[i+1]) for i in inds]
end

My application follows

using Plots
gr()
# gererate a sample and plot it
λ(x,y,z) = x^2+y^2+z^2-2*x*y-2*y*z-2*z*x
density(s) = sqrt(λ(25,s,1)*λ(s,1,1))/s
data = getRandom(density, 4,16, 1000000, 100);
histogram(data, bin=200, xlabel="Mpipi^2 (GeV)", ylabel="Entries / 60 MeV", label="Dalitz plot projection")

Next question is the same for the function of two arguments.

Tamas_Papp · September 19, 2017, 8:29am

This is a well-known problem, which has many solutions, with various trade-offs. A good introduction is Random number generation and Monte Carlo methods, by James E. Gentle.

misha_mikhasenko · September 19, 2017, 12:32pm

Thank you for the reference, Tamas.
Could you list solutions in julia, you are aware of, please?

dpsanders · September 19, 2017, 1:02pm

Isn’t the idea basically to construct the cumulative distribution function (with bins) and choose the correct bin? You can order the 2d bins in some way in a 1d vector and it reduces to the previous problem as far as I can tell.

dpsanders · September 19, 2017, 1:04pm

Or you could just use a Metropolis Markov chain monte carlo method.

misha_mikhasenko · September 19, 2017, 1:17pm

Hello David,

yes, I agree, it reduces to the previous problem unless I want to do linear interpolation.
Indeed, Metropolis would be a good option if it is already implemented and easy to call.

dpsanders · September 19, 2017, 1:38pm

Why does it have to be already implemented? It’s about 5 lines. I have some code that I wrote somewhere.

Tamas_Papp · September 19, 2017, 1:49pm

I am not aware of anything besides what is already in Distributions.jl. Implementing some generic methods could be a learning experience.

jonathanBieler · September 19, 2017, 2:44pm

Next question is the same for the function of two arguments.

You can still use StatsBase.sample in 2D or 3D, you just vectorize your matrix or probabilities and then convert back the indices to coordinates using ind2sub. I wouldn’t try to the invertLinear thing then, just increase Ndiv to your desired precision.

Otherwise there’s some samplers here:

I haven’t tested them though.

jdnz · December 20, 2017, 3:51am

I was looking for something similar and found this question. Following the suggestion to use Metropolis Monte Carlo, I wrote the following code to sample N-dimensional vectors r according to a probability distribution function prob_dist(r).

function sample_dist!(r, step_size, n_steps)
	r_trial = similar(r)
	for j = 1:n_steps
		randn!(r_trial)
		@. r_trial = r + step_size*r_trial
		prob_ratio = prob_dist(r_trial)/prob_dist(r)
		prob_ratio >= rand() ? copy!(r, r_trial) : nothing
	end
	return r
end

rveltz · December 20, 2017, 7:40am

if your function to be sampled from is sufficiently smooth, you can use ApproxFun too

misha_mikhasenko · December 20, 2017, 3:50pm

Thanks for the example. I am also wondering how smooth the result is.
Although, I would also use Metropolis if the problem is in many dimensions.

I can give an update on the solution I wrote before
My density function is \sqrt{(x-a)(x-b)(x-c)(x-d)}/x for given values of a,b,c,d.
Below I create a function randX() which gives a random value according to the density.

# constants
const mπ = 0.139; const mπ2 = mπ^2;
const mη = 0.548; const mη2 = mη^2;

# density function
λ(x,y,z) = x^2+y^2+z^2-2*x*y-2*y*z-2*z*x
fρ(s,m1sq,m2sq,m3sq,m0sq)=(sqrt(s)>sqrt(m1sq)+sqrt(m2sq)) ? sqrt(λ(s,m1sq,m2sq)*λ(m0sq,s,m3sq))/s : 0;

# prepare cumulative function
const dX = collect(linspace((mη+mπ)^2,3^2,100))
fX = [fρ(s,mη2,mπ2,mp2,19^2) for s in 0.5*(dX[1:end-1]+dX[2:end])]; fX /= sum(fX)
[fX[i] = fX[i-1]+fX[i] for i in 2:length(fX)];
const cX = fX;

# X distribution
function randX()
    bi = searchsortedlast(cX,rand())+1
    return dX[bi] + rand()*(dX[bi+1]-dX[bi])
end

One does need extra libraries, only a method for the quick search.

I noticed that it works with const ~3 times faster.

Topic		Replies	Views
Interface for a distribution defined by its cdf Statistics distributions	16	921	October 26, 2023
A uniform way to generate a random element based on a given probability distritution function, a random number genarator and a given element interval? Statistics	1	1496	January 20, 2019
User-defined distribution function example Statistics question	14	4885	March 21, 2019
Draw a random number through a probability distribution defined as an array General Usage distributions , random , sampling	5	115	May 21, 2025
Inverse transform sampling (discrete distributions sampling)? Performance question	4	848	March 1, 2021

Generate random value from a given function, out of box

Related topics