Running out of RAM with repeated cholesky solves

Hello,

I have a system in which I need to repeatedly solve an electrostatics problem with an evolving charge configuration. When I left my program running overnight, I came back to find that RAM utilization had accumulated to around 30GB and the program had slowed to a crawl. I believe I’ve located the problem line
ldiv!(solution, stiff_factorization, force_vector)
When I comment this out, the program is able to complete with well under ~1GB of RAM allocation. I’m wondering if I need to manually clean up any memory being used by the solver. I tried to create a small example of what I’m working with, but memory doesn’t seem to accumulate in the same way. I’m not very familiar with the workings of the Garbage Collector, so any advice about how to avoid a memory leak over longer runs is appreciated. Thank you!

using JLD2
using LinearAlgebra

stiffness_matrix = load("./dev/ldiv_memory/stiff_mat.jld2", "stiff_mat")
issymmetric(stiffness_matrix)
factorization = cholesky(stiffness_matrix)

dim = size(stiffness_matrix, 1)
sol = zeros(dim)

for i=1:1_000_000
    force_vector = rand(dim)
    ldiv!(sol, factorization, force_vector)
end

What Julia version are you using. I fixed something similar and it landed on 1.12.5 so if you could try there

1 Like

I just ran juliaup update and up in the package manager. I’m on the 1.12.5 release, but task manager (Windows) keeps showing the memory increasing. Previously, I tried using cg! which seemed to reduce the memory load but it also seems to be running more slowly for my matrix.

You really shouldn’t need to manually clean up memory. In the Windows task manager, what memory metric are you looking at? (Depending on how you measure memory it can be deceptive due to virtual memory.)

Is stiffness_matrix a dense matrix or a sparse one?

(In the latter case, it might be a memory leak in the external CHOLMOD library that we are calling for sparse Cholesky solves? Though this would be surprising… that library has been around for a long while.)

Are you sure it’s not something else? For example, if NaN’s have crept into your vectors then things will slow to a crawl due to floating-point exceptions.

Does the example code you posted exhibit the problem (if you use it with your stiff_mat.jld2) or not?

1 Like