Storage allocation in LAPACK.potrf!


#1

This is a duplicate of https://groups.google.com/forum/#!topic/julia-users/u9Tjpg19e9k . I decided to move the conversation to discourse because it allows for formatting of code.

I am encountering an unexpected amount of storage allocation in the cfactor!{A::HBlkDiag) method in the MixedModels package. See
https://github.com/dmbates/MixedModels.jl/blob/master/src/cfactor.jl for the code.

An HBlkDiag matrix is a homogeneous block diagonal matrix where “homogeneous” refers to the fact that all the diagonal blocks are square and of the same size. Because of this homogeneity the blocks can be stored in an r x r x k array where r is the size of each of the square block and k is the number of such blocks.

On entry to this method the blocks are symmetric, positive semi-definite. I want to overwrite the upper triangle of each of these blocks with its Cholesky factor. I call LAPACK.potrf! directly because I don’t want cholfact! to throw a non-positive-definite error. The strange thing to me is that when I monitor the storage allocation, I get a huge amount of storage being allocated in the line with that call. This may be because LAPACK.potrf! returns a tuple of the original matrix and an Int (the info code) but I’m not sure.

To see an example of this unusual amount of allocation try the following code with julia --track-allocation=user

using Feather, MixedModels
cd(Pkg.dir("MixedModels", "test", "data"))
sleepstudy = Feather.read("sleepstudy.feather", nullable=false)
fm1 = fit!(lmm(Reaction ~ 1 + Days + (1 + Days | Subject), sleepstudy))
Profile.clear_malloc_data()
devs, vars, betas, thetas = bootstrap(10_000, fm1)

I get

        - function cfactor!{T}(A::HBlkDiag{T})
        -     Aa = A.arr
        0     r, s, t = size(Aa)
        0     if r ≠ s
        0         throw(ArgumentError("HBlkDiag matrix A must be square"))
        -     end
 94428000     scm = Array(T, (r, r))
        0     for k in 1 : t  # FIXME: Lots of allocations in this loop
        0         for j in 1 : r, i in 1 : j
        0             scm[i, j] = Aa[i, j, k]
        -         end
566568000         LAPACK.potrf!('U', scm)
        0         for j in 1 : r, i in 1 : j
        0             Aa[i, j, k] = scm[i, j]
        -         end
        -     end
 10492000     UpperTriangular(A)
        - end

In this case the HBlkDiag matrix being decomposed is 36 by 36 consisting of 18 2 by 2 diagonal blocks. scm is a scratch 2 by 2 matrix that is overwritten in sequence by the upper triangle of each of the original 2 by 2 blocks and passed to LAPACK.potrf!


#2

LAPACK.potrf! allocates a Ref to store the info value from LAPACK. The allocation in potrf! is 6x that of the allocation for scm and since the loop length is 18 the allocation in potrf! seems about right to me. The numbers are probably high because cfactor! is called many times.


#3

This makes it seem worthwhile to allocate my own BLASInt vector of length 1 and call dpotrf! directly.


@allocate in REPL, different results, same code
#4

Are the matrices always small? It might be worth considering StaticArrays although I don’t know if you can bypass the failure as easily when the matrices are not numerically positive definite.


#5

For the record, in 0.7.0-dev the Ref{Float64} does not appear to trigger a heap allocation.