Hello,
I’m editing this question to make it clearer since I got no solution still .
I need to implement a NN that outputs a covariance matrix. Say we have N variables, I need to be able to compute a symetric NxN positive semi-definite matrix from the output of the NN.
The authors of the algorithm I’m trying to implement use a vector output of length Nx(N+1)/2.
When they sample, this vector is transformed into a lower triangular matrix and multiplied by its transpose to obtain a symetric NxN positive semi-definite matrix, treated as the covariance matrix.
That is the output vector is the elements of the cholesky decomposition of the covariance matrix.
I found a pytorch example of this, here’s the relevant part:
#da is what I call N
cholesky_vector = self.cholesky_layer(x) #Vector of length Nx(N+1)/2
cholesky_diag_index = torch.arange(da, dtype=torch.long) + 1 # julia equivalent is [1:da;]
cholesky_diag_index = (cholesky_diag_index * (cholesky_diag_index + 1)) // 2 - 1 # computes the indices of the future diagonal elements of the matrix
cholesky_vector[:, cholesky_diag_index] = F.softplus(cholesky_vector[:, cholesky_diag_index]) #softplus projects the diagonal to >0.
tril_indices = torch.tril_indices(row=da, col=da, offset=0) # Collection that contains the indices of the non-zero elements of a lower triangular matrix
cholesky = torch.zeros(size=(B, da, da), dtype=torch.float32).to(device) #initialize a square matrix to zeros
cholesky[:, tril_indices[0], tril_indices[1]] = cholesky_vector # Assigns the elements of the vector to their correct position in the lower triangular matrix
where da
is the number of dimensions of the distribution, what I call N (we deal with a da x da
matrix that is)
Here’s how I tried to reimplement this in julia, accounting for the fact that we are now in a column major setting (I need to make an upper triangular matrix instead of lower) and with one-based arrays. I took an example for a matrix with da = N = 4
using Flux, LinearAlgebra
N = 4
cholesky_vector = rand(N*(N+1)÷2 ,1) # output of the vector
function sigma(v)
cholesky_diag_index = [1:N;] .* [2:(N+1);] ÷ 2 #indices of the elements of cholesky_vector that constitute the diagonal of the UT matrix
softplus(x) = log(1+exp(x))
cholesky_vector[cholesky_diag_index] .= softplus.(cholesky_vector[cholesky_diag_index]) # make sure the standard deviations are > 0
triu_indices = [CartesianIndex(i,j) for i in 1:N, j in 1:N if i <= j] #indices of the lower triangular part of a matrix
chol = zeros(N,N)
for (lidx,cidx) in enumerate(tril_indices)
chol[cidx] = cholesky_vector[lidx]
end
Sigma = chol*chol' #reconstruction of covariance matrix
sum(Sigma) # suming elements to get a scalar function, just to see if the gradient works.
end
g = gradient(sigma, cholesky_vector)
And I get an error telling me that mutating arrays are not supported due to the setindex!. I don’t really know how to deal with this, pytorch seems to handle it. Anyone knows a trick to do this ?
Thanks a lot