Extreme memory usage in nested loop

I need to conduct extensive memory and performance testing on a memory-intensive function that creates mutable structs, temporary arrays, and more, even though most of the variables are type-defined. Running a single instance of the function works fine; my 16GB of RAM can handle it. However, the problem arises when I repeatedly call this function to measure execution time and memory consumption in various sections of the code. To do this, I’m using TimerOutputs since adapting my code to use BenchmarkTools would be too cumbersome, considering its size.

The issue I’m facing is that memory consumption seems to accumulate between function calls, eventually causing my Ubuntu 20.04 system to crash. Consequently, I’m restricted in how many tests I can run before hitting a certain memory threshold, even though each individual test runs successfully, individually.

Here’s an example of what my code looks like:

function memuse() return parse(Int, split(read(`ps -p $(getpid()) -o rss`, String))[2]) / 1024 end

nrepeats = 10
log = Dict()
inputs1 = ProblemInput(1, 1000)
inputs2 = ProblemInput(2, 100)
problem1 = Problem(handle1, inputs1)
problem2 = Problem(handle2, inputs2)
problems = [problem1, problem2]
for i in 1:length(problems)
    for input in problems[i].input
    	processed_input = process(input)
        for j in 1:nrepeats
            memuse = Helpers.memuse() 
            if memuse > 1.0e4							
                ccall(:malloc_trim, Cvoid, (Cint,), 0)
                if memuse > 1.3e4 # Threshold, more than this will crash
                    throw("Memory consumption too high!")
            obj = problems[i].handle(processed_input)
            save_measures(i, obj.data, log) 
            obj = nothing

In this code snippet, I iterate over a set of problems and inputs, repeatedly calling the corresponding ‘handle’ function within each problem to measure its performance. Inside these ‘handle’ functions, there is a significant amount of memory-intensive operations. Most of which are caused by temporary SArrays. The persistent data (e.g., Problems, Input1, etc.) does not significantly contribute to the problem. These data structures incur minimal memory allocation compared to the ones made within the ‘handle’ function. Unfortunately, refactoring the code at this point is not ideal, as it’s designed to closely match a Python and a MATLAB version for direct performance comparison.

The issue at hand is that even within the innermost for loop, memory allocation accumulates with each iteration of the same ‘handle’ function. To illustrate, if a single function call allocates 1GB of memory and nrepeats = 10, the final memory consumption would reach around 10GB. Moreover, even when the problem changes, the previously allocated memory from previous problems still lingers. This means that even for smaller inputs, when conducting numerous repeats, the program eventually crashes. To avoid the need to restart the computer, I’ve implemented the memuse bit of the code.

So, my primary question is: Is there a way to prompt Julia or Ubuntu to promptly release the memory allocated during these function calls? I’ve tried GC.gc(true) and ccall(:malloc_trim, Cvoid, (Cint,), 0), but their effect is very limited, typically freeing only around 100MB, even when the function allocates up to 2GB of memory on the largest input. Is there any solution beyond refactoring my code?

If you need more specific details or if my explanation is insufficient, please let me know, and I can provide further information or the complete code.

You call this a function, but the code snippet looks like almost everything is in the global scope. Is there more to this code, like it being nested in a function block or a module structure? How are you running it repeatedly and benchmarking its performance? If it were an actual function, you can use BenchmarkTools on a function call, it doesn’t matter how large a method body is.

Yes, the entire code snippet is encapsulated within a function, and there are no global variables used anywhere in the entire code. The example was really just for illustration purposes. The problem with using BenchmarkTools is that I require measurements of execution time and memory consumption within specific sections of the ‘handle’ code, rather than just measuring the method call itself. TimerOutputs proved to be the easiest tool for this, generating a nice table and being easy to implement.

Every component of the code is modularized and encapsulated. The tests, the ‘Problem’ type as a mutable struct and it’s “methods”, the ‘Input’ type as a struct, and the ‘handle’ function itself.

Without the full code, including how you’re running the overall function repeatedly to a crash, it’ll be impossible to spot the memory leak. It’d be nice to reduce the whole thing to a short MWE with the same issue but I imagine that’s not straightforward.

Most I can comment now is that if GC.gc only frees a fraction of memory allocated in the same iteration, then there is probably a live reference (variable, container element) to the unfreed memory. For example, the log dictionary persists across all iterations, so anything you put in there will not be touched by the garbage collector, though it’s not clear how you’re doing it because you don’t provide the code for save_measures.

I also have to ask again what you mean exactly by “function call”. You said

To illustrate, if a single function call allocates 1GB of memory and nrepeats = 10, the final memory consumption would reach around 10GB.

which implies you are referring to a function inside the for j in 1:nrepeats loop. But now you’re saying the entire code snippet is inside a function. So which function are you repeating and benchmarking? This is why it’s important to post that code, it’s not clear what you’re doing.

The measuring is happening inside of

obj = problems[i].handle(processed_input)
save_measures(i, obj.data, log) # This simply saves the data inside of log 

I’m sorry for the miscommunication: I call the entire snippet of code the ‘test,’ and the thing that allocates a lot of memory and the object of measure is problems[i].handle, which is a function, defined in a outside module. The i-th problem corresponds to a particular function handle, but only for reference (I believe).

I will put the entire code of this respective part here, but it may serve only to confuse even more:

function run_tests(meshfiles :: Array, meshfiles_qfive_spot :: Array, nrepeats :: Int64 = 1, try_precompile :: Bool = true)
    if try_precompile
        meshf = "./mesh/cube_hex.msh"
        meshqfive = "./mesh/box_2.msh"
        qfive_spot(MeshHandler.Mesh(meshqfive), (6, 4, 1))
    p1 = Helpers.Problem("linear_case", linear_case, length(meshfiles) > 5 ? meshfiles[1:5] : meshfiles)
    p2 = Helpers.Problem("quadratic_case", quadratic_case, length(meshfiles) > 5 ? meshfiles[1:5] : meshfiles)
    p3 = Helpers.Problem("extra_case1", extra_case1, meshfiles)
    p4 = Helpers.Problem("extra_case2", extra_case2, meshfiles)
    p5 = Helpers.Problem("qfive_spot", qfive_spot, meshfiles_qfive_spot)
    meshes_dict = Dict()
    for meshfile in [meshfiles ; meshfiles_qfive_spot]
        mesh = MeshHandler.Mesh(meshfile)
        meshes_dict[meshfile] = mesh
    problems = [p1, p2, p3, p4, p5]
    verbose = false
    for i in 1:length(problems)
        k = 1
        solver_name = ""
        for meshfile in problems[i].meshfiles
            mesh = meshes_dict[meshfile]
            for j in 1:nrepeats
                memuse = Helpers.memuse()
                if memuse > 1.0e4
                    counter = 0 
                    while Helpers.memuse() > 1.0e4
                        counter += 1
                        if counter > 3
                    if memuse > 1.3e4
                        throw("Memory consumption too high!")
                if problems[i].name != "qfive_spot"
                    solver = problems[i].handle(mesh, verbose)
                    solver = problems[i].handle(mesh, (6, 4, 1), verbose)
                solver_name = solver.name
                Helpers.add_to_problem!(problems[i], k, j, solver.mesh.nvols, solver.error, solver.times, solver.memory)
                solver = nothing             
            k += 1
        pr          = problems[i]
        name        = solver_name
        meshfiles   = pr.meshfiles
        nvols       = pr.nvols_arr
        error       = pr.avg_error_arr
        times       = pr.avg_time_arr
        memory      = pr.avg_memory_arr

        d = Dict("name"         => name,
                 "meshfiles"    => meshfiles,
                 "nvols"        => nvols,
                 "error"        => error,
                 "times"        => times,
                 "memory"       => memory)

        JLD2.jldsave("./results/$(pr.name).jld2"; d)

Perhaps the equivalence with my example is clear

Of course, there’s a lot of shared stuff, but I made sure that none of the information contained in the problem[i].handle returned object is referenced anywhere else. In the bits that it is passed as a argument (the add_to_problem! part in the real code and the save_measures part in the exemplified one), it is not copied. Even if it was, the times, memory and error parameters pale comparing to some temporary arrays declared inside the handle itself, so they cannot be the problem. The rest that is indeed shared (The Input and log part) are also pretty small, not even in the range of tens of megabytes.

But, it is true that the handle references the Input, which persists. Maybe that’s the problem?

You’re right, that still doesn’t really help us because the Problem struct, various things you assign to the handle field, and save_measures isn’t provided. The run_tests function doesn’t even have a save_measures call or a log variable.

I don’t know what you’re talking about, there’s no Input variable anywhere, and it’s not clear which field is handle, let alone how what they are.

If it’s too difficult to provide the necessary information for us to know what is happening, you can try logging some Base.summarysize(obj) calls across iterations to check if live references’ memory usage is blowing up across iterations. In your first example, I might log Base.summarysize.([log, inputs1, inputs2, problem1, problem2]).

Sorry again for the miscommunicatio. Since the first bit of code is only an example to resemble what I’m actually doing, I’ve named the variables and methods according to which names I thought would better communicate the general purpose of the code, instead of the exact equal name of the original code. It would make even less sense given the lack of context. So, of course, it is not clear exactly which methods and variables are parallel to which.

Anyway, it is indeed difficult to really show everything here, since the size of the modules and all the methods used are immense and would require at least a couple of hours to be analyzed, let alone understood (I know clarity isn’t one of my best qualities).

I’ll do what you told me, and come back with the results. Thanks a lot for the dedicated time.

If the function you are calling uses threads, this problem may be related to 1.9’s memory management issues (supposedly to be resolved in 1.10, some say). Cf: