Generate random value from a given function, out of box

question

#1

Hi,

I am looking for a simple way to generate a random number according to a given distribution.
The function is not easy to invert and I do not want to do it. I will be happy to find a numerical method.

It could digitize the function at some discrete points and make the inversion numerically, as it is done in
https://root.cern.ch/root/html600/src/TF1.cxx.html#gYdi6C

I am working in the hadron spectroscopy, that is sometimes about large datasets and a lot of MonteCarlo.
The common tool is ROOT with C++ https://root.cern.ch/, but it is C++.
I also use Mathematica, but it is slow and commercial.
Julia might be a good option in between.

As an example, I want:
3 variables (will be colums, say A B C), 10e7 rows for random values distributed according to some known functions.
A ~ sqrt(1-1/a), in the interval 1:5,
B and C are correlated: B x C ~ sqrt(b)*(b+c)^2, b in 2:6, c in 0:10

Btw, would you suggest to use DataFrames of work with a nested array?
Is there a way to apply function on columns over rows of the matrix? Something like

map(x->x[1]+x[2], myMatrix)

Thanks


#2

I think that you are looking for Distributions.jl. It has quite a few distributions to choose from and you can use it to make your own distribution which you can sample from.

I am not sure that I understand the question, but . syntax lets you broadcast over arrays e.g.

f.(myMatrix)

applies f elementwise.

On the other hand if you have a function f that works on 1d arrays and you want to apply it to the columns of a matrix, then you could use a list comprehension:

julia> myMatrix = rand(2,2)
2×2 Array{Float64,2}:
 0.283274  0.95737   
 0.492706  0.00804106

julia> f(x) = x[1] + x[2]
f (generic function with 1 method)

julia> [f(myMatrix[:, i]) for i in 1:2]
2-element Array{Float64,1}:
 0.916368
 0.475755

#3

Thank you, Jack.

I looked at the Distributions.jl package. While it is great and has a lot of functionality it does not seems to be right to use for my problem. It is rand method which has to be implemented if I understood correctly,

https://juliastats.github.io/Distributions.jl/latest/extends.html

I am sure, trying to implement all possible PDFs is hopeless. The Distributions.jl-people should have thought about a numerical procedure for the arbitrary function. I just can not find.

You are right about the matrix! Since I am new in julia, it will take some to used to it.
Perhaps, the question array vs matrix is about the syntactic sugar. A selection (or filtering) looks easier for a nested array, while it is a bit more work to convert it to the plotting format (an array for x, an array for y). Finally, it is a real problem.


#4

You can use StatsBase.sample with a Weights vector, e.g.

using StatsBase, Gadfly

x = linspace(0,π,100)
P = sin(x)
P = P/sum(P)

r = StatsBase.sample(1:100, Weights(P),10000)
plot(x=x[r],Geom.histogram)

http://juliastats.github.io/StatsBase.jl/stable/sampling.html#Sampling-API-1


#5

Jonathan,

thank you. Great, that is almost what I need!
My variable is still discrete, while it is important to keep it continuous.
Then, PDF is step-like or even better linearly interpolated.


#6

mapslices


#7

I wrote a solution for the problem based on StatsBase.sample following Jonathan suggestion.
It would be great if something like this exists in StatsBase.

Please let me know if the same can be done better, easier.

using StatsBase

function invertLinear(t,xmin,xmax,ymin,ymax)
    # special case, if Pol1
    ymin == ymax && return xmin + t*(xmax-xmin)
    # otherwise, Pol2
    A = (ymax-ymin)/(xmax-xmin)/2;
    B = ymin;
    C = - t*(ymin*(xmax-xmin)+(xmax-xmin)*(ymax-ymin)/2);
    x = xmin + (-B + sqrt(B^2-4*A*C))/(2*A)
    return x
end

function getRandom(func, xmin, xmax, Nvalues, Ndiv)
    x = linspace(xmin,xmax,Ndiv)
    P = func.(x)
    # remove interval where the function is negative
    P = [v<0 ? 0 : v for v in P]    
    # weight is given by integral
    Pa = [(P[i]+P[i+1])/2 for i in 1:Ndiv-1]
    Pa = Pa/sum(Pa)
    # generate set on random bin indexes
    inds = StatsBase.sample(1:Ndiv-1, Weights(Pa), Nvalues)
    # convert the set of indexes to random variables inside a bin
    return [invertLinear(rand(),x[i],x[i+1],P[i],P[i+1]) for i in inds]
end

My application follows

using Plots
gr()
# gererate a sample and plot it
λ(x,y,z) = x^2+y^2+z^2-2*x*y-2*y*z-2*z*x
density(s) = sqrt(λ(25,s,1)*λ(s,1,1))/s
data = getRandom(density, 4,16, 1000000, 100);
histogram(data, bin=200, xlabel="Mpipi^2 (GeV)", ylabel="Entries / 60 MeV", label="Dalitz plot projection")

Next question is the same for the function of two arguments.


#8

This is a well-known problem, which has many solutions, with various trade-offs. A good introduction is Random number generation and Monte Carlo methods, by James E. Gentle.


#9

Thank you for the reference, Tamas.
Could you list solutions in julia, you are aware of, please?


#10

Isn’t the idea basically to construct the cumulative distribution function (with bins) and choose the correct bin? You can order the 2d bins in some way in a 1d vector and it reduces to the previous problem as far as I can tell.


#11

Or you could just use a Metropolis Markov chain monte carlo method.


#12

Hello David,

yes, I agree, it reduces to the previous problem unless I want to do linear interpolation.
Indeed, Metropolis would be a good option if it is already implemented and easy to call.


#13

Why does it have to be already implemented? It’s about 5 lines. I have some code that I wrote somewhere.


#14

I am not aware of anything besides what is already in Distributions.jl. Implementing some generic methods could be a learning experience.


#15

Next question is the same for the function of two arguments.

You can still use StatsBase.sample in 2D or 3D, you just vectorize your matrix or probabilities and then convert back the indices to coordinates using ind2sub. I wouldn’t try to the invertLinear thing then, just increase Ndiv to your desired precision.

Otherwise there’s some samplers here:

I haven’t tested them though.


#16

I was looking for something similar and found this question. Following the suggestion to use Metropolis Monte Carlo, I wrote the following code to sample N-dimensional vectors r according to a probability distribution function prob_dist(r).

function sample_dist!(r, step_size, n_steps)
	r_trial = similar(r)
	for j = 1:n_steps
		randn!(r_trial)
		@. r_trial = r + step_size*r_trial
		prob_ratio = prob_dist(r_trial)/prob_dist(r)
		prob_ratio >= rand() ? copy!(r, r_trial) : nothing
	end
	return r
end

#17

if your function to be sampled from is sufficiently smooth, you can use ApproxFun too


#18

Thanks for the example. I am also wondering how smooth the result is.
Although, I would also use Metropolis if the problem is in many dimensions.

I can give an update on the solution I wrote before
My density function is \sqrt{(x-a)(x-b)(x-c)(x-d)}/x for given values of a,b,c,d.
Below I create a function randX() which gives a random value according to the density.

# constants
const mπ = 0.139; const mπ2 = mπ^2;
const mη = 0.548; const mη2 = mη^2;

# density function
λ(x,y,z) = x^2+y^2+z^2-2*x*y-2*y*z-2*z*x
fρ(s,m1sq,m2sq,m3sq,m0sq)=(sqrt(s)>sqrt(m1sq)+sqrt(m2sq)) ? sqrt(λ(s,m1sq,m2sq)*λ(m0sq,s,m3sq))/s : 0;

# prepare cumulative function
const dX = collect(linspace((mη+mπ)^2,3^2,100))
fX = [fρ(s,mη2,mπ2,mp2,19^2) for s in 0.5*(dX[1:end-1]+dX[2:end])]; fX /= sum(fX)
[fX[i] = fX[i-1]+fX[i] for i in 2:length(fX)];
const cX = fX;

# X distribution
function randX()
    bi = searchsortedlast(cX,rand())+1
    return dX[bi] + rand()*(dX[bi+1]-dX[bi])
end

One does need extra libraries, only a method for the quick search.

I noticed that it works with const ~3 times faster.