Is there a quicker way to obtain normalized multivariate random normal draws?

I want to make n random draws from a Multivariate Normal distribution, then divide each element in each of the random draws by the norm of the random draw. Mathematically, I want to obtain \mathbf{y}_i=\mathbf{x}_i/||\mathbf{x}_i|| for i=1,\dots,n, where \mathbf{x}_i\sim\text{N}(\boldsymbol{\mu},\mathbf{V}).

Here is my current code:

X = rand(MvNormal(mu, Hermitian(V)), n)'
Y = X ./ sqrt.(sum(X.^2, dims=2))

This is fast, but I’m hoping to make this even quicker. I am using this code in an iterative algorithm, where this is called hundreds of times per iteration, so any computational savings will add up and help! However, I’m doubtful that it can be made quicker as I have yet to find a way to do so. This discourse is my last hope!

Thank you for your time.

This allocates 3 arrays — one for X, another for X.^2, and another for Y. (I’m also not sure if there are any computations happening inside MvNormal that you can precompute if you are re-using mu and V in multiple calls?) You should be able to do better.

Can you post runnable code with typical example values of mu, V, and n? The best solution could depend on these, and posting runnable examples is the best practice anyway if you want to ask for help.

1 Like

You may be looking for something along these lines:

julia> n = 10
julia> V = rand(n, n);
julia> D = MvNormal(zeros(n), Hermitian(V*V'));
julia> X = normalize!(rand(D))
10-element Vector{Float64}:
  0.4164718118357489
 -0.13384293852202814
  0.6469353808584453
 -0.16738703557848347
 -0.11171640120444568
  0.11685513640811303
  0.40181104360935266
  0.3343677550039351
 -0.10421558304760534
  0.22769061412809866

Thank you for your response! Unfortunately, the mu and V will be different every time I try to do these computations. Thank you for asking for sample code; it makes sense that this would help here.

using Random, Distributions, LinearAlgebra
n = 1000 # will not change
mu = 2*ones(3) # will change every call
V = [0.74 -0.08 0.34; -0.08 1.57 -0.5; 0.34 -0.5 1.16] # will change every call
X = rand(MvNormal(mu, Hermitian(V)), n)'
Y = X ./ sqrt.(sum(X.^2, dims=2))

I hope this helps!

BTW note that most of the time is spent in the random sampling.

I see, you want n normalized samples:

julia> foreach(normalize!, eachrow(X));

might be it - X now holds normalized rows (no need for Y which is an extra copy).

To get another copy:

julia> Y = stack(map(normalize, eachrow(X)); dims=1);

might be good.

1 Like

Thank you for your suggested code, Dan. The foreach() approach speeds up the normalization line (see attached photo). Do you believe there is a quicker way to sample from the multivariate normal? I don’t think there is; otherwise, it would be the currently implemented way, but I figured I may as well ask!

The way MvNormals are sampled is by sampling standard MvNormal and then colouring them with the mean and variance. The sampling might be faster by sampling standard normals for all matrix entries and then finding a way to generate the appropriate matrices for colouring them faster (this is a slow matrix decomposition IIRC).

If the variance generation process is somehow beneficial to optimization (depending on the exact process), this might help.

And a sidenote: When using @benchmark it is better to quote (use $X) for global variables. This sometimes matters.

1 Like

Note that if you are doing this over and over you might want to pre-allocate X and use rand! as well.

However, there’s also the question of the overall structure of the code. If you are working with lots of 3-component and 3x3 arrays, you might want to use StaticArrays.jl, and you might get better performance if you transition away from a Matlab/Numpy “vectorized” style to a more loopy style.

For example, a typical “vectorized” code might look like:

  1. generate 1000 multivariate normal vectors
  2. normalize them
  3. loop over them and do some computation
  4. loop over the results of that computation and do some other computation
  5. etc.

which leaves a lot of performance on the table because you are looping over and over again over the same array (or worse, generating new arrays with each pass). Better to do (in an allocation-free way, e.g. with SVector):

  1. generate 1 multivariate normal vector
  2. normalize it
  3. do some computation with it
  4. do some computation with the result
    and then repeat this 1000 times, i.e. put the loop for i = 1:n outside these steps. If it’s done right, this will often be faster.
1 Like

Yes, there should be, if you always know you have 3x3 covariance matrices, then you can use StaticArrays.

Thank you for your responses. I’ll look through StaticArrays this weekend and keep you updated in this thread. I appreciate your help thus far!

For example, let’s compare your method:

function doit(mu, V, n)
    X = rand(MvNormal(mu, Hermitian(V)), n)
    foreach(normalize!, eachcol(X))
    return X
end

to a version that expects static arrays, and returns an array of 3-component SVectors instead of a matrix:

using StaticArrays
function doit(mu::SVector, V::SMatrix, n)
    L = cholesky(V).U'
    X = [normalize(mu + L * @SVector(randn(3))) for _=1:n]
    return X
end

On my machine, I find that

mu_s = SA[2., 2., 2.]
V_s = SA[0.74 -0.08 0.34; -0.08 1.57 -0.5; 0.34 -0.5 1.16]
@btime doit($mu_s, $V_s, $n);

is almost twice as fast as @btime doit($mu, $V, $n); and if you pre-allocate X then you gain another 10%.

(And, as I said above, it may be even better if you merge this loop with any additional processing you want to do on these random vectors, rather than the traditional “vectorized” style of doing many individual passes over the data with built-in “vectorized” operations, which is only necessary in languages where loops are slow.)

Note that this implementation is essentially pulling out the internals of the MvNormal algorithm: if V has the Cholesky factorization V = L L^*, and r = randn(3), then x = \mu + L r has mean \mu and covariance V.

1 Like

Thank you for providing this example. My d will not always be 3, but the suggested code still seems quicker even with larger d. I found that

function doit(mu::SVector, V::SMatrix, n)
    L = cholesky(V).U'
    X = [normalize(mu + L * @SVector(randn(d))) for _=1:n]
    return X
end

(replacing the 3 in randn with a variable) significantly slowed things down. Is there a way around this, or is that part of the JIT nature of Julia? If so, that’s ok. I’ll just need to manually change the d values if d changes between experiments.

These static arrays seem pretty fast! I’ll try to implement them elsewhere in my algorithm. Thank you for your guidance!

Also, when you mention the pre-allocation of X, I’m not sure I know what you mean. Does this mean that X is defined before the function call, and we just modify it within the function?

The key to exploiting static arrays is that the length must be known at compile time.

One way to do this but still allow it to vary at a high level (not inside tight loops!) is to pass the length as part of a type. One way is to pass it via a Val argument, but it’s even better (more elegant and reliable) to get it from another SVector argument, in this case mu. Unfortunately I’m typing on my phone now so I can’t show you an example, but I’ll post something later if no one else beats me.

PS. Also, since your d is apparently a global variable, this is particularly slow.

1 Like

Replacing d with length(mu) is enough. Since mu is an SVector, its length is known at compile time and so it is essentially equivalent to hardcoding the constant. Also the d causing a slowdown could probably be attributes to it being a global variable, which really annoys Julia (this is mentioned in all Julia performance tips).

1 Like

Thank you for the tips! Dan suggested length(mu), which works quite well :slight_smile:

Thank you for your help, Dan! I greatly appreciate your time.