I am working on a finite element program that solves a non linear problem by incrementing the boundary conditions, that means that from time to time I need to assemble some quite large stiffness matrices (\approx 350k \times 350k) and solve linear algebra problems.

I am surprised of how much memory the program actually uses compared to what I expected and I am trying to understand why that is happening.

Before starting the main loop of the program, after the mesh has been read from the input file and all necessary data structure have been created I check the memory occupancy of all my variables with the @show_local macro by @cgeoga from this post and I get a total usage of about 1.5gb, which is what I expected for a 113k elements model, but if I check the output from `pmap`

on that process it says 30gb!

But thatâ€™s not all, as the calculations proceed the memory usage from `pmap`

grows seemingly out of control reaching in excess of 100gb after two hundred increments, when it gets killed by OS

During the main loop the program needs to update a few stiffness matrices and solve linear problems, but everything happens inside functions, so temporary variables should be taken care of by the garbage collection, as no additional matrix or vectors are created but the existing updated.

I tried to force garbage collection invoking `GC.gc()`

at the end of every load increment, but things didnâ€™t change.

I am using `SparseArrays`

to store the stiffness matrix and LienearAlgebra.cholesky factorization to solve the linear algebra problems, I also store the cholesky factorization for re-use.

Everything runs on a node of a cluster, and I asked for two cores and 96gb,

can anyone help to understand what is happening and is it possible to contain the memory usage of a program of this type?