**Summary**

I have written a Julia function to compute the Multivariate Normal integral.

I need a partner who is an expert in Distributions.jl to properly integrate it into the code base.

**Introduction**

I have written a function in Julia to compute the central Multivariate Normal probabilities:

\Phi_k(\mathbf{a},\mathbf{b};\mathbf{\Sigma} ) = \frac{1}{\sqrt{\left | \mathbf{\Sigma} \right |{(2\pi )^k}}}\int_{a_1}^{b_1}\int_{a_2}^{b_2}\begin{align*} &...\end{align*} \int_{a_k}^{b_k}e^{^{-\frac{1}{2}}\mathbf{x}^t{\mathbf{\Sigma }}^{-1}\boldsymbol{\textbf{x}}}dx_k...dx_1

where \Sigma is a symmetric, positive definite *k* x *k* covariance matrix. Vector **a** is the lower integration limit, and **b** is the upper integration limit. As this is the “central” MVN, \mu = \vec{0}.

This is a translation of the MATLAB qsimvnv function written by Prof. Alan Genz.

**Examples:**

```
julia> r = [4 3 2 1;3 5 -1 1;2 -1 4 2;1 1 2 5]
4×4 Array{Int64,2}:
4 3 2 1
3 5 -1 1
2 -1 4 2
1 1 2 5
julia> a = [-Inf; -Inf; -Inf; -Inf]
4-element Array{Float64,1}:
-Inf
-Inf
-Inf
-Inf
julia> b = [1; 2; 3; 4 ]
4-element Array{Int64,1}:
1
2
3
4
julia> (p,e)=qsimvnv(r,a,b)
(0.6049240592769761, 0.0016812280451818128)
```

In this example, 0.604924… is the computed probability, and 0.00168122… is the estimated error of the calculation. The function uses quasi-random integration to compute the integral, so you get a slightly different result each time it’s run. The results of the function have been checked for correctness by comparing them to R, MATLAB, and published results.

```
julia> (p,e)=qsimvnv(r,a,b)
(0.6060880398660634, 0.001540997560720256)
julia> (p,e)=qsimvnv(r,a,b)
(0.6060880398660634, 0.001540997560720256)
julia> (p,e)=qsimvnv(r,a,b)
(0.6067495056529963, 0.0013883605551699985)
```

You can increase the iterations to reduce the error. The default is 1000*dimension of \Sigma. Decrease in error is roughly linearly with increase in iterations:

```
julia> (p,e)=qsimvnv(r,a,b;m=5000)
(0.6061177664363478, 0.0018974040888668662)
julia> (p,e)=qsimvnv(r,a,b;m=6000)
(0.605451604851963, 0.0009217826063167869)
julia> (p,e)=qsimvnv(r,a,b;m=7000)
(0.6057328187356109, 0.0009514371151440057)
julia> (p,e)=qsimvnv(r,a,b;m=80000)
(0.6056921305482306, 0.0002024445811897692)
julia> (p,e)=qsimvnv(r,a,b;m=9000)
(0.6060092106313155, 0.0009367451258961832)
julia> (p,e)=qsimvnv(r,a,b;m=10000)
(0.605821023964934, 0.0005674428432139305)
```

This could be the basis of a MVN cdf function for Distributions.jl which operates with the MvNormal objects.

```
julia> r=float(r)
4×4 Array{Float64,2}:
4.0 3.0 2.0 1.0
3.0 5.0 -1.0 1.0
2.0 -1.0 4.0 2.0
1.0 1.0 2.0 5.0
julia> mu=[0, 0, 0, 0]
4-element Array{Int64,1}:
0
0
0
0
julia> mvn1 = MvNormal(mu, r)
FullNormal(
dim: 4
μ: [0.0, 0.0, 0.0, 0.0]
Σ: [4.0 3.0 2.0 1.0; 3.0 5.0 -1.0 1.0; 2.0 -1.0 4.0 2.0; 1.0 1.0 2.0 5.0]
)
julia> mvn_cdf(D::AbstractMvNormal, a::AbstractVector, b::AbstractVector; m::Union{Integer, Nothing} = nothing) = qsimvnv(D.Σ.mat,a .- D.μ, b .- D.μ;m=m)
mvn_cdf (generic function with 1 method)
julia> mvn_cdf(mvn1,a,b;m=5000)
(0.6056845764526123, 0.0015238130294856398)
```

**What I have**

- Julia implementation of qsimvnv()
- Julia implementation of Prof. Genz’s function that 1) scales and re-arranges the integration vectors, and 2) computes the cholesky decomposition of \Sigma. This is called by qsimvnv().
- Several test cases, consisting of matrices of various sizes, the expected probability and error estimate results, and tolerances for each. This could form the basis of test code.

**What I Don’t Have**

- A good understanding of the structure of the Distributions.jl code and all of the different MvNormal objects. When I look at the code, I can’t even figure out where I would put the two functions I have.
- Programming skills with Julia types, methods, multiple dispatch. The “mvn_cdf” example above is the pinnacle of my achievement, and it doesn’t work unless you specify m= in it (i.e., not specifying m to use the default results in an error).
- Git experience

**What I Need**

- I need a partner who is an expert in the structure of Distributions.jl and the MvNormal objects, who knows how to integrate it into the code base and write the methods for all of the different MvNormal objects. I would expect his partner would also identify improvements to the code I’ve written for qsimvnv() and _chlrdr (the cholesky decomposition function).
- Note that MvNormal objects can have nonzero mean vectors; to use qsimvnv(), the \mu vector needs to be subtracted from the integration limits
**a**and**b**. - Some of the MvNormal objects use Correlation matrices instead of Covariance matrices. These need to be converted to covariance matrices before calling the function.
- Given a template to work from, I can probably contribute to writing the test code. I can also run tests, and I can compare the results to MATLAB as needed.

**Next Steps**

If someone wants to partner up on this, let me know what to do next. I could create a PR to dump the code in to the /src directory, but my partner would have to fix it afterwards. Or … let me know how to proceed.

**Appendix - Statistics Notes**

One way to define the Multivariate Normal CDF is for “simultaneous X” – that is, a for a given random vector \vec{x}, the probability that every component of \vec{x} is less than a reference vector, \vec{x_0}:

F(\vec{x_{0}}) = P(\vec{x} \leqslant \vec{x_{0}})

which can be generalized to:

F(\vec{a},\vec{b}) = P(\vec{a} \leqslant \vec{x} \leqslant \vec{b})

Fn(\vec{a},\vec{b};\mu,\Sigma) = P(ai < Xi < bi;i=1...n)

F_{n}(\vec{a},\vec{b};\vec{\mu} ,\Sigma ) = P(a_{i}<X_{i}<b_{i}; i=1...n) = \int_{a_{1}}^{b_{1}}\int_{a_{2}}^{b_{2}}\cdots \int_{a_{i}}^{b_{i}}f(x_{1},x_{2},...,x_{i})dx_{1}dx_{2}\cdots dx_{i}

We define:

F_{n}(\vec{a},\vec{b};\vec{\mu} ,\Sigma ) = \Phi_k(\mathbf{a},\mathbf{b},\mathbf{\Sigma} )

*where:*

\Phi_k(\mathbf{a},\mathbf{b};\mathbf{\Sigma} ) = \frac{1}{\sqrt{\left | \mathbf{\Sigma} \right |{(2\pi )^k}}}\int_{a_1}^{b_1}\int_{a_2}^{b_2}\begin{align*} &...\end{align*} \int_{a_k}^{b_k}e^{^{-\frac{1}{2}}\mathbf{x}^t{\mathbf{\Sigma }}^{-1}\boldsymbol{\textbf{x}}}dx_k...dx_1

Note that you’re only really computing the CDF if \vec{a} = -\infty.

If \vec{a}=\vec{0} and \vec{b} = +\vec{\infty}, this is called the *orthant probability.*

**Non-Central MVN**

The *non-central MVN integral* is given by:

\Phi_k(\mathbf{a},\mathbf{b};\mathbf{\Sigma},\delta ) = \frac{1}{\sqrt{\left | \mathbf{\Sigma} \right |{(2\pi )^k}}}\int_{a_1}^{b_1}\int_{a_2}^{b_2}\begin{align*} &...\end{align*} \int_{a_k}^{b_k}e^{^{-\frac{1}{2}}\mathbf{(x-\delta)}^t{\mathbf{\Sigma }}^{-1}\mathbf{(x-\delta)}}dx_k...dx_1

where \delta denotes the k x 1 non-centrality vector with -{\infty} < \delta_i < +{\infty}.

The non-central MVN integral can determined from the central MVN with shifted integration limits:

\Phi_k(\mathbf{a},\mathbf{b};\mathbf{\Sigma},\delta ) = \Phi_k(\mathbf{a-\delta},\mathbf{b-\delta };\mathbf{\Sigma})

**Standardized MVN**

The non-central MVN probability \Phi_k(\mathbf{a},\mathbf{b};\mathbf{\Sigma},\delta ) can also be transformed into the *standardized MVN* probability. The standardized MVN has \vec{\mu} = \vec{0} and \Sigma = I_n.

Let \mathbf{D} denote the diagonal matrix of the square roots of the diagonal entries of \Sigma. The correlation matrix \mathbf{R} is defined by \mathbf{\Sigma = DRD}. Then the transformation \mathbf{x = Dy + \delta} reduces the non-central MVN probability to

\Phi_k(\mathbf{a},\mathbf{b};\mathbf{\Sigma},\delta ) = \Phi_k(\mathbf{D^{-1}(a-\delta),D^{-1}(b-\delta)};\mathbf{R})

**Mahalanobis Distance**

The *other way* to think about the MVN probability is that of a sample vector \vec{x_0} falls within the ellipsoid defined by MVN(\vec{\mu},\Sigma). This has a closed form solution. If you’re interested in this, see Distribution of observation-level Mahalanobis distance.