I am estimating a model. In the process of estimation I have to convert a fixed independent and normally distributed vector to a correlated multivariate normal vector. The way I am doing that is using this:
\bf{x} \sim N(\bf{0}, I) \Rightarrow S \bf{x} \sim N(0,SS').
That means that if I can obtain a matrix \bf{S} such that \bf{S}\bf{S}' is positive semi definite, I can premultiply vector \bf{x} by \bf{S} to obtain a correlated normal vector. The problem I’m facing is that I am parameterizing matrix \Sigma = \bf{S} \bf{S}'. Say it is 3x3:
Σ = [s11 s12 s13;
s12 s22 s23;
s13 s23 s33]
Those are the parameters I am supplying the optimizer. I am obtaining \bf{S} using a Cholesky decomposition.
How can I impose restrictions on the components of Σ such that it is positive semi definite (and therefore the Cholesky decomposition exists)?
I thought about parameterizing \bf{S} instead, and then constructing Σ = S * transpose(S). The problem in that case is that Σ ends up having all positive values and I want to allow for some negative components in it (allowing for negative covariance between components of \bf{x}).
Here is a simple example:
using LinearAlgebra
x = randn(3, 100)
function obj(s; x)
s11 = s[1]
s12 = s[2]
s13 = s[3]
s22 = s[4]
s23 = s[5]
s33 = s[6]
Σ = [s11 s12 s13;
s12 s22 s23;
s13 s23 s33]
S = cholesky(Σ).L
y = S * x
return f(y)
end
f(y) = sum(y)
Again, I want to be able to supply function obj to an optimizer, which means that it should be able to accept any vector s. What are restrictions I can place on Σ to ensure it is positive semi definite? For example, the main diagonal elements must be non-negative, which can be achieved with an exp transform.
What I want is to make function obj able to accept any inputs, transform them, and make a psd matrix from them. That way, I can safely supply function obj to an optimizer without having to worry whether the function will error because it can’t construct psd Σ.
I thought about parameterizing instead, and then constructing Σ = S * transpose(S). The problem in that case is that Σ ends up having all positive values and I want to allow for some negative components in it (allowing for negative covariance between components of ).
I think this is probably the better approach. That said, I’m not sure why you would be seeing Σ ending up having all positive values. Were you making S be symmetric rather than triangular?
I’m not exactly sure how to solve your problem, but there’s the NearestCorrelationMatrix.jl library that can find a positive definite correlation matrix from an input matrix. Can you compute the the covariance matrix from the correlation matrix?
For optimization over constrained spaces, I usually reach for TransformVariables.jl providing transformations onto unconstrained vectors. In particular, for covariance matrices check out corr_cholesky_factor.
In case you want to get really fancy, you can also use Manopt.jl and optimize over the manifold of symmetric positive definite matrices directly.
I was checking TransformVariables.jl, the author says that the densities need to be adjusted. Why is that necessary? What I had in mind to do is to employ the delta method to obtain the correct variances of the transformed parameters (those in \Sigma) from the obtained estimates (those in S).
When you change your parametrization, e.g., mapping from an unconstrained vector x \in \mathbb{R}^n to some constrained object \theta \in \Theta, the density changes accordingly. Let g: x \to \theta, then
where \left| \frac{d\theta}{dx} \right| denotes the log Jacobian of g.
E.g., in Bayesian estimation this is important when the prior is defined on \theta, but the sampling is done on x. When optimizing the likelihood this is usually not required as the maximum is not effected by the coordinate transform, e.g., \mathrm{argmax}_{\theta} p(D | \theta) = g\left( \mathrm{argmax}_{x} p(D | g(x))\right).
Don’t know how/if that applies to the delta method … hope it helps anyways.
Thank you! Is there any name for that? Can you provide some references where I can read more about it?
What I had in mind to do was maximize the likelihood with respect to unconstrained vector, obtain the variance covariance of those parameters, and then use the delta method to obtain the variance covariance for the transformed parameters:
In the context of Bayesian modeling, the Stan manual discusses reparametrization and change of variables.
If I get your case correctly, you maximize the likelihood on the unconstrained space, approximate the uncertainty there via the Laplace approximation, i.e., using the Hessian as inverse variance, and then use another approximation – the delta method – in order to obtain the variance on the constrained space? Here, I would probably skip the delta method altogether and simply use the Laplace approximation on the constraint space directly, i.e., compute \frac{\partial^2 \mathcal{l}}{\partial \theta}(\hat{\theta}) directly. In any case, you won’t need to correct for the change of variable as the transformation is applied to the point estimates \hat{x} \to \hat{\theta} only.
Any symmetric positive definite (SPD) matrix has a unique Cholesky decomposition M=L*L'
where L is lower triangular. Hence the space of n by n SPD matrices has dimension n*(n+1)/2, and a way to constrain an algorithm to only explore all such matrices and only them is to use the upper triangular coefficients of L as unknowns.