How to preallocate a triangular matrix and overwrite it with cholesky!

Hi everyone, I am trying to improve my code’s efficiency. I locate a line that might be too expensive, but I failed to figure out how to improve it.

In brief, I need to save the inverse of an N by N covariance matrix of N realizations of a Gaussian process with Matern covariogram. In the following code chunk, I define a function Maternlu to compute the Matern covariance between two points separated by distance x. coords records the coordinates of all realizations and matrix M stores what I need.

using Distributions
using LinearAlgebra
using Distances
using SpecialFunctions

function Maternlu(x; Ξ½=1.0, Ο• = 6.0, Οƒ2 = 1.0)
    
    # Matern function
    
    if x == 0.0
        Οƒ2
    else
        a = Ο• * x;
        Οƒ2 * 2^(1 - Ξ½) / gamma(Ξ½) * (a)^Ξ½ * besselk(Ξ½, a)
    end
end

N = 400;
coords = rand(2, N); #coordinates of observations

M = inv(cholesky(Symmetric(Maternlu.(
            UpperTriangular(pairwise(Euclidean(), coords, dims = 2)), 
            Ξ½ = 1.0, Ο• = 6.0))));

The last line requires around 6MB but the matrix M itself only requires 1.2MB. Moreover, I only need the upper-triangular part of M .

@benchmark M = inv(cholesky(Symmetric(Maternlu.(

            UpperTriangular(pairwise(Euclidean(), coords, dims = 2)), 

            Ξ½ = 1.0, Ο• = 6.0))))

BenchmarkTools.Trial: 63 samples with 1 evaluation.
 Range (min … max):  39.158 ms … 83.481 ms  β”Š GC (min … max): 0.00% … 3.10%
 Time  (median):     80.759 ms              β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   80.414 ms Β±  5.400 ms  β”Š GC (mean Β± Οƒ):  0.27% Β± 0.73%

                                                        β–„β–ˆβ–    
  β–ƒβ–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–„β–ˆβ–ˆβ–ˆβ–†β–ˆ ▁
  39.2 ms         Histogram: frequency by time        83.3 ms <

 Memory estimate: 6.10 MiB, allocs estimate: 79819.

I think the ideal code will be like

  1. preallocate an N by N upper-triangular matrix
  2. overwrite it with the upper-triangular part of the distance matrix among the N locations
  3. overwrite it with the upper-triangular part of the Matern covariance
  4. use function like cholesky! to overwrite this matrix with the Cholesky decomposition
  5. overwrite it with the upper-triangular part of the inverse of the Matern covariance

But I don’t know how to preallocate an upper-triangular matrix and it seems to me that the current cholesky! doesn’t support an upper-triangular matrix input. Any thoughts about what I can do to improve my code? Many thanks!!!

an UpperTriangular is just a wrapper type of a normal matrix:

julia> fieldtypes(UpperTriangular)
(AbstractMatrix,)

So does it mean that I cannot save memory by defining an UpperTriangular matrix?

Triangular matrices typically don’t save memory since it would make operations slower.

2 Likes

I see. No wonder I can hardly improve my code by using a triangular matrix. Thanks a lot!

I wrote a function MaternluU!

function MaternluU!(A::AbstractMatrix; Ξ½=1.0, Ο• = 6.0, Οƒ2 = 1.0)
    
    # Matern function
    m,n = size(A)
    @boundscheck m == n || throw(BoundsError())
    a = 0.0;
    
    @inbounds for j in 1:n
        for i in 1:j
            if A[i,j] == 0.0
                A[i,j] = Οƒ2
            else
                a = A[i,j] * Ο•;
                A[i,j] = Οƒ2 * 2^(1 - Ξ½) / gamma(Ξ½) * (a)^Ξ½ * besselk(Ξ½, a)
            end
        end
    end
    return A
end

It slightly improves the memory cost.

function computeM(coords)
    M = pairwise(Euclidean(), coords, dims = 2);
    MaternluU!(M);
    M = Symmetric(M, :U);
    M = cholesky(M);
    M = inv(M);
end
@benchmark computeM(coords)
BenchmarkTools.Trial: 62 samples with 1 evaluation.
 Range (min … max):  38.149 ms … 86.843 ms  β”Š GC (min … max): 0.00% … 1.96%
 Time  (median):     82.426 ms              β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   81.841 ms Β±  5.785 ms  β”Š GC (mean Β± Οƒ):  0.21% Β± 0.64%

                                                        β–ˆ      
  β–‚β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–‚β–β–β–‚β–β–…β–ˆβ–†β–ƒβ–ƒβ–ƒ ▁
  38.1 ms         Histogram: frequency by time        85.9 ms <

 Memory estimate: 4.88 MiB, allocs estimate: 79809.


Looking like with code

M = pairwise(Euclidean(), coords, dims = 2);
MaternluU!(M);
M = Symmetric(M, :U);
cholesky!(M)

I can overwrite the upper triangular part of M with the Cholesky decomposition of the covariance matrix. But how can I compute the inverse of the covariance matrix efficiently with the last matrix? Many thanks!

Do you actually need the inverse?
Note that you can do X / M or M \ Y, which are equivalent to X * inv(M) and inv(M) * Y, respectively, but much faster and more numerically stable.
Also, see ldiv! and rdiv! for inplace versions.

Also, logdet should work as well, and be far faster.

EDIT:
I really don’t reccomend it, but LinearAlgebra.inv!(cholesky!(M)) should work.
Also, if you have an Intel CPU (or even an AMD CPU), using MKL is likely to speed these operations up by quite a bit.

2 Likes

Do you actually need the inverse?

Yes, I need it to construct a sequence of matrices in my algorithm. Here is part of my code where I need this inverse. The inverse matrix is invR_nk and I need it to construct the chol_inv_M in a for loop.

    invR_nk = inv(cholesky(Symmetric(Maternlu.(
            UpperTriangular(pairwise(Euclidean(), coords[:, CV_ind_ls[k]], dims = 2)), 
            Ξ½ = nu_pick, Ο• = phi_pick))));
    R_k_nk = Maternlu.(pairwise(Euclidean(), coords[:, CV_ind_hold_ls[k]], 
            coords[:, CV_ind_ls[k]], dims = 2), Ξ½ = nu_pick, Ο• = phi_pick);
    
    for i2 in 1:L_grid_deltasq
        deltasq_pick = deltasq_grid[i2];
        if p == 0
            chol_inv_M = cholesky(invR_nk + Diagonal(repeat([1 / deltasq_pick], nk_list[k]))); 
            u = y[CV_ind_ls[k]] /  deltasq_pick;
        else
            chol_inv_M = cholesky(Symmetric(vcat(
                        hcat(XTX_list[k] / deltasq_pick + inv_V_Ξ², X[:, CV_ind_ls[k]] / deltasq_pick),
                        hcat(zeros(nk_list[k], p), 
                            invR_nk + Diagonal(repeat([1 / deltasq_pick], nk_list[k])))), :U));
            u = [inv_V_ΞΌ_Ξ² + XTy_list[k] / deltasq_pick; y[CV_ind_ls[k]] /  deltasq_pick];
        end

        ldiv!(chol_inv_M.U', u);  # u2 = chol_inv_M.L \ u;

LinearAlgebra.inv! should be fine

julia> LinearAlgebra.inv!(cholesky(Symmetric(A)))
4Γ—4 Matrix{Float64}:
  0.334048   -0.0773346  -0.0323587  -0.0599437
 -0.0773346   0.211738    0.0864574   0.0753268
 -0.0323587   0.0864574   0.14029     0.125532
 -0.0599437   0.0753268   0.125532    0.467126

julia> inv(A)
4Γ—4 Matrix{Float64}:
  0.334048   -0.0773346  -0.0323587  -0.0599437
 -0.0773346   0.211738    0.0864574   0.0753268
 -0.0323587   0.0864574   0.14029     0.125532
 -0.0599437   0.0753268   0.125532    0.467126

Also:

invR_nk + Diagonal(repeat([1 / deltasq_pick], nk_list[k]))

this could be
invR_nk + (1 / deltasq_pick) * I
to avoid allocating the two vectors from [1 / deltasq_pick] and repeat.

You could also replace all the vcat and hcats with an initial allocation of an array of the final size, and copy data directly into it.

1 Like

Terrific! Thank you so much. I think that is what I want. Now the memory cost drops from 6.1MB to 2.44MB.

function computeM(coords)
    M = pairwise(Euclidean(), coords, dims = 2);
    MaternluU!(M);
    M = Symmetric(M, :U);
    LinearAlgebra.inv!(cholesky!(M)) 
end
@benchmark computeM(coords)
BenchmarkTools.Trial: 60 samples with 1 evaluation.
 Range (min … max):  80.800 ms … 87.584 ms  β”Š GC (min … max): 0.00% … 3.06%
 Time  (median):     84.594 ms              β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   84.614 ms Β±  1.068 ms  β”Š GC (mean Β± Οƒ):  0.14% Β± 0.59%

                                     β–†β–†β–‚β–‚ β–ˆβ–‚                   
  β–†β–β–β–β–β–β–β–β–β–β–β–β–β–β–„β–β–β–β–β–β–β–β–β–β–β–β–β–β–„β–ˆβ–„β–ˆβ–„β–†β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–„β–ˆβ–ˆβ–ˆβ–†β–β–β–„β–β–β–β–β–β–β–†β–β–β–β–„β–„ ▁
  80.8 ms         Histogram: frequency by time        86.8 ms <

 Memory estimate: 2.44 MiB, allocs estimate: 79805.
1 Like