How to preallocate a triangular matrix and overwrite it with cholesky!

LuZhangstat · March 25, 2022, 1:45pm

Hi everyone, I am trying to improve my code’s efficiency. I locate a line that might be too expensive, but I failed to figure out how to improve it.

In brief, I need to save the inverse of an N by N covariance matrix of N realizations of a Gaussian process with Matern covariogram. In the following code chunk, I define a function Maternlu to compute the Matern covariance between two points separated by distance x. coords records the coordinates of all realizations and matrix M stores what I need.

using Distributions
using LinearAlgebra
using Distances
using SpecialFunctions

function Maternlu(x; ν=1.0, ϕ = 6.0, σ2 = 1.0)
    
    # Matern function
    
    if x == 0.0
        σ2
    else
        a = ϕ * x;
        σ2 * 2^(1 - ν) / gamma(ν) * (a)^ν * besselk(ν, a)
    end
end

N = 400;
coords = rand(2, N); #coordinates of observations

M = inv(cholesky(Symmetric(Maternlu.(
            UpperTriangular(pairwise(Euclidean(), coords, dims = 2)), 
            ν = 1.0, ϕ = 6.0))));

The last line requires around 6MB but the matrix M itself only requires 1.2MB. Moreover, I only need the upper-triangular part of M .

@benchmark M = inv(cholesky(Symmetric(Maternlu.(

            UpperTriangular(pairwise(Euclidean(), coords, dims = 2)), 

            ν = 1.0, ϕ = 6.0))))

BenchmarkTools.Trial: 63 samples with 1 evaluation.
 Range (min … max):  39.158 ms … 83.481 ms  ┊ GC (min … max): 0.00% … 3.10%
 Time  (median):     80.759 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   80.414 ms ±  5.400 ms  ┊ GC (mean ± σ):  0.27% ± 0.73%

                                                        ▄█▁    
  ▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄███▆█ ▁
  39.2 ms         Histogram: frequency by time        83.3 ms <

 Memory estimate: 6.10 MiB, allocs estimate: 79819.

I think the ideal code will be like

preallocate an N by N upper-triangular matrix
overwrite it with the upper-triangular part of the distance matrix among the N locations
overwrite it with the upper-triangular part of the Matern covariance
use function like cholesky! to overwrite this matrix with the Cholesky decomposition
overwrite it with the upper-triangular part of the inverse of the Matern covariance

But I don’t know how to preallocate an upper-triangular matrix and it seems to me that the current cholesky! doesn’t support an upper-triangular matrix input. Any thoughts about what I can do to improve my code? Many thanks!!!

jling · March 25, 2022, 1:54pm

an UpperTriangular is just a wrapper type of a normal matrix:

julia> fieldtypes(UpperTriangular)
(AbstractMatrix,)

LuZhangstat · March 25, 2022, 1:57pm

So does it mean that I cannot save memory by defining an UpperTriangular matrix?

Oscar_Smith · March 25, 2022, 2:05pm

Triangular matrices typically don’t save memory since it would make operations slower.

LuZhangstat · March 25, 2022, 2:58pm

I see. No wonder I can hardly improve my code by using a triangular matrix. Thanks a lot!

I wrote a function MaternluU!

function MaternluU!(A::AbstractMatrix; ν=1.0, ϕ = 6.0, σ2 = 1.0)
    
    # Matern function
    m,n = size(A)
    @boundscheck m == n || throw(BoundsError())
    a = 0.0;
    
    @inbounds for j in 1:n
        for i in 1:j
            if A[i,j] == 0.0
                A[i,j] = σ2
            else
                a = A[i,j] * ϕ;
                A[i,j] = σ2 * 2^(1 - ν) / gamma(ν) * (a)^ν * besselk(ν, a)
            end
        end
    end
    return A
end

It slightly improves the memory cost.

function computeM(coords)
    M = pairwise(Euclidean(), coords, dims = 2);
    MaternluU!(M);
    M = Symmetric(M, :U);
    M = cholesky(M);
    M = inv(M);
end

@benchmark computeM(coords)
BenchmarkTools.Trial: 62 samples with 1 evaluation.
 Range (min … max):  38.149 ms … 86.843 ms  ┊ GC (min … max): 0.00% … 1.96%
 Time  (median):     82.426 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   81.841 ms ±  5.785 ms  ┊ GC (mean ± σ):  0.21% ± 0.64%

                                                        █      
  ▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▁▁▂▁▅█▆▃▃▃ ▁
  38.1 ms         Histogram: frequency by time        85.9 ms <

 Memory estimate: 4.88 MiB, allocs estimate: 79809.

LuZhangstat · March 25, 2022, 3:11pm

Looking like with code

M = pairwise(Euclidean(), coords, dims = 2);
MaternluU!(M);
M = Symmetric(M, :U);
cholesky!(M)

I can overwrite the upper triangular part of M with the Cholesky decomposition of the covariance matrix. But how can I compute the inverse of the covariance matrix efficiently with the last matrix? Many thanks!

Elrod · March 25, 2022, 3:23pm

Do you actually need the inverse?
Note that you can do X / M or M \ Y, which are equivalent to X * inv(M) and inv(M) * Y, respectively, but much faster and more numerically stable.
Also, see ldiv! and rdiv! for inplace versions.

Also, logdet should work as well, and be far faster.

EDIT:
I really don’t reccomend it, but LinearAlgebra.inv!(cholesky!(M)) should work.
Also, if you have an Intel CPU (or even an AMD CPU), using MKL is likely to speed these operations up by quite a bit.

LuZhangstat · March 25, 2022, 3:28pm

Do you actually need the inverse?

Yes, I need it to construct a sequence of matrices in my algorithm. Here is part of my code where I need this inverse. The inverse matrix is invR_nk and I need it to construct the chol_inv_M in a for loop.

    invR_nk = inv(cholesky(Symmetric(Maternlu.(
            UpperTriangular(pairwise(Euclidean(), coords[:, CV_ind_ls[k]], dims = 2)), 
            ν = nu_pick, ϕ = phi_pick))));
    R_k_nk = Maternlu.(pairwise(Euclidean(), coords[:, CV_ind_hold_ls[k]], 
            coords[:, CV_ind_ls[k]], dims = 2), ν = nu_pick, ϕ = phi_pick);
    
    for i2 in 1:L_grid_deltasq
        deltasq_pick = deltasq_grid[i2];
        if p == 0
            chol_inv_M = cholesky(invR_nk + Diagonal(repeat([1 / deltasq_pick], nk_list[k]))); 
            u = y[CV_ind_ls[k]] /  deltasq_pick;
        else
            chol_inv_M = cholesky(Symmetric(vcat(
                        hcat(XTX_list[k] / deltasq_pick + inv_V_β, X[:, CV_ind_ls[k]] / deltasq_pick),
                        hcat(zeros(nk_list[k], p), 
                            invR_nk + Diagonal(repeat([1 / deltasq_pick], nk_list[k])))), :U));
            u = [inv_V_μ_β + XTy_list[k] / deltasq_pick; y[CV_ind_ls[k]] /  deltasq_pick];
        end

        ldiv!(chol_inv_M.U', u);  # u2 = chol_inv_M.L \ u;

Elrod · March 25, 2022, 3:32pm

LinearAlgebra.inv! should be fine

julia> LinearAlgebra.inv!(cholesky(Symmetric(A)))
4×4 Matrix{Float64}:
  0.334048   -0.0773346  -0.0323587  -0.0599437
 -0.0773346   0.211738    0.0864574   0.0753268
 -0.0323587   0.0864574   0.14029     0.125532
 -0.0599437   0.0753268   0.125532    0.467126

julia> inv(A)
4×4 Matrix{Float64}:
  0.334048   -0.0773346  -0.0323587  -0.0599437
 -0.0773346   0.211738    0.0864574   0.0753268
 -0.0323587   0.0864574   0.14029     0.125532
 -0.0599437   0.0753268   0.125532    0.467126

Also:

invR_nk + Diagonal(repeat([1 / deltasq_pick], nk_list[k]))

this could be
invR_nk + (1 / deltasq_pick) * I
to avoid allocating the two vectors from [1 / deltasq_pick] and repeat.

You could also replace all the vcat and hcats with an initial allocation of an array of the final size, and copy data directly into it.

LuZhangstat · March 25, 2022, 3:37pm

Terrific! Thank you so much. I think that is what I want. Now the memory cost drops from 6.1MB to 2.44MB.

function computeM(coords)
    M = pairwise(Euclidean(), coords, dims = 2);
    MaternluU!(M);
    M = Symmetric(M, :U);
    LinearAlgebra.inv!(cholesky!(M)) 
end

@benchmark computeM(coords)
BenchmarkTools.Trial: 60 samples with 1 evaluation.
 Range (min … max):  80.800 ms … 87.584 ms  ┊ GC (min … max): 0.00% … 3.06%
 Time  (median):     84.594 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   84.614 ms ±  1.068 ms  ┊ GC (mean ± σ):  0.14% ± 0.59%

                                     ▆▆▂▂ █▂                   
  ▆▁▁▁▁▁▁▁▁▁▁▁▁▁▄▁▁▁▁▁▁▁▁▁▁▁▁▁▄█▄█▄▆█████▄███▆▁▁▄▁▁▁▁▁▁▆▁▁▁▄▄ ▁
  80.8 ms         Histogram: frequency by time        86.8 ms <

 Memory estimate: 2.44 MiB, allocs estimate: 79805.

Topic		Replies	Views
How to implement a Gaussian NN with covariance matrix? Machine Learning linearalgebra , flux , zygote	7	1264	February 23, 2022
Most efficient implementation of covariance matrix Performance statistics , multithreading , nerdsnipe , cov , covariance-matrix	12	855	May 28, 2024
Could this be faster using some decomposition? Performance linearalgebra	4	333	January 27, 2022
Improving Speed for a Novice Performance	7	2190	May 14, 2018
Choleski Factorization transpose New to Julia question , linearalgebra	6	972	June 9, 2021

How to preallocate a triangular matrix and overwrite it with cholesky!

Related topics