High RAM usage

Hi

I’ve built a code which basically calculates the value of a complex non-linear function with a set of parameters for which I need to find a fixed point. Because of it is computationally heavy, I use distributed computing. Below is my attempt at simplifying the code I built.

model_wrapper!(pa::Param,pr::Prices,q_ret::AbstractArray{<:Real},q_act::AbstractArray{<:Real})

    while abs(diff_AE) > tol_out || abs(diff_h) > tol_out
        println("Starting inner loop")

        # initiate mtg price convergence variables
        diff_act_mtg = 1.0; diff_ret_mtg = 1.0

        # Inner loop: mortgage pricing function
        #--------------------------------------
        while diff_act_mtg > tol_in || diff_ret_mtg > tol_in

            # compute model
            out = model_compute(pa_new,pr_new,q_ret,q_act)

            # compute the norm between the mortgage pricing function
            diff_act_mtg,act_coor = findmax(broadcast(abs,out.q_act_new-q_act))
            diff_ret_mtg,ret_coor = findmax(broadcast(abs,out.q_ret_new-q_ret))

            # current distances
            println("")
            println("Convergence criteria")
            println("diff (housing)  = $diff_h")
            println("diff (mtg act)  = $diff_act_mtg")
            println("diff (mtg ret)  = $diff_ret_mtg")

            # update guesses of the mortgage pricing function
            if diff_act_mtg > tol_in || diff_ret_mtg > tol_in # update only if it has not converged
                q_ret       = (1.0-hom_q)*out.q_ret_new + hom_q*q_ret
                q_act       = (1.0-hom_q)*out.q_act_new + hom_q*q_act
            end

        end

        println("")
        println("---- Inner loop converged ----")

        abs(diff_h) > tol_out # only update if loop did not converge
            pr_new.p_h   = hom_p_h*pr_new.p_h + (1.0-hom_p_h)*((mm.HD-mm.HS)/pa_new.Lbar)^(pa_new.φ/(1.0-pa_new.φ))*(1.0/(1.0-pa_new.φ))
            pr_new.p_h   = max(pr_new.p_h,0.01)
            pr_new.p_hp  = copy(pr_new.p_h)
        end
        end

        println("p_h = ",pr_new.p_h)
        println("AE  = ",pa_new.AE)
    end

    # compute the model one last time to obtain the statistics consistent with the updates
    out = model_compute(pa_new,pr_new,q_ret,q_act)

    println("")
    println("---- Outer loop converged ----")

    return (q_ret=out.q_ret,q_act=out.q_act,pa_new=out.pa_new,pr_new=out.pr_new)

end

What the function model_wrapper does is to take guesses for p_h, q_ret, q_act, and compute a fixed point using a homotopy algorithm. model_compute is the non-linear function for which I’m calculating the fixed point. Essentially, it takes parameters and guesses for the variables and computes values for the variables which are compared with the guesses. model_wrapper repeats this process until convergence.

Model compute can be summarized as follows:

model_compute(pa::Param,pr::Prices,q_ret::AbstractArray{<:Real},q_act::AbstractArray{<:Real})

   pa_new = deep_copy(pa)
   pr_new = deep_copy(pr)

    vn_act   = SharedArray{Float64}(nb,nϵ,na)          # value function
    cnf_act  = SharedArray{Float64}(nb,nϵ,na)          # consumption function
    bnf_act  = SharedArray{Float64}(nb,nϵ,na)          # asset function
    htnf_act = SharedArray{Float64}(nb,nϵ,na)          # rent decision
    hnf_act  = SharedArray{Float64}(nb,nϵ,na)          # house decision
    mnf_act  = SharedArray{Float64}(nb,nϵ,na)          # mortgage decision
    go_act   = SharedArray{Int}(nb,nϵ,na)              # decision to own

   @sync @distributed for i in CartesianIndices(view(vn_act,1,:,:,1))
            iϵ = i[1]; ia = i[2]; iy = (ia-1)*nϵ + iϵ
            for ib in eachindex(b_grid)
                bn,vn,cn,htn,hn,mn,gn      = solve_problems(pa_new,pr_new,ib,iϵ,ia,q_ret,q_act)
                vn_act[ib,iϵ,ia]       = vn    # value function
                cnf_act[ib,iϵ,ia]      = cn    # consumption function
                bnf_act[ib,iϵ,ia]      = bn    # bond function
                htnf_act[ib,iϵ,ia]     = htn   # decision to rent
                hnf_act[ib,iϵ,ia]      = hn    # own house
                mnf_act[ib,iϵ,ia]      = mn    # mortage decision
                go_act[ib,iϵ,ia]       = gn    # decision to own
            end
        end

      out = calculate_statistics(pa_new,pr_new,vn_act,cnf_act,bnf_act,htnf_act,mnf_act,go_act)
      return (pa_new = out.pa_new, pr_new = out.pr_new, q_ret_new = out.q_ret_new, q_act_new=out.q_act_new)
end

The problem which I did not foresee and cannot understand is the following: When I run the function model_wrapper many times, in order to see what happens when I change parameters, the RAM of my computer keeps allocating more and more memory (I have 256 G of RAM, and 64 processors) up to a point where I cannot run the function anymore. Could this be the result of creating new shared arrays each time I run the function? Why are they not eliminated from the memory of the cores once the model compute is finished? Thanks for any assistance

Are these global variables? I can’t seem to find them in model_wrapper!.

Have you checked for type stability with @code_warntype? Did you follow most of the Performance Tips from the docs?

This will probably allocate a new array, why not use the two-arg version of findmax((a,b) -> abs(a-b), zip(out.q_act_new, q_act))?

I’m guessing out.q_ret_new, hom_q as well as q_ret are matrices - if so, those operations will allocate an output array. Preallocate a working array and modify it directly instead.


If you’re saving all state created during model_wrapper! and model_compute, GC will never clean it up and delete it because you’re still referencing it.

2 Likes

The “out” variable is used inside the while loops when updating q_ret and q_act for example. It is also used outside the while loop to get the output of the model_wrapper! function and maintain type stability.
I’ve checked every part of the code (as best I could) using the @code_warntype macro, yes. I’ve also tried to follow the performance tips to the best of my ability.

I’ll try this thanks. Is there any rule of thumb (or procedure) to know whether a line of code creates new arrays? I’m guessing the best way is to @time it and check the allocations right? Though its difficult to understand why allocations are made sometimes.

hom_q is a scalar. q_ret_new and q_ret are arrays indeed. q_ret is an input to the model_wrapper function, so I thought that I was modifying it directly. But maybe I’m just creating a new binding when I write:

q_ret       = (1.0-hom_q)*out.q_ret_new + hom_q*q_ret

Would this change modify it instead of allocating a new one?

q_ret       .= (1.0-hom_q)*out.q_ret_new + hom_q*q_ret

Or maybe

q_ret[:,:,:,:]  = (1.0-hom_q)*out.q_ret_new + hom_q*q_ret

I thought that all arrays created inside model_wrapper! would be eliminated with the exception of those returned at the end. Apparently this is not the case. Thanks for your comments

I’m specifically asking about pa_new and pr_new , as those are not defined in the version of model_wrapper! you’ve given here (assuming both functions are defined seperately on the top level).

As a rule of thumb, julia will not modify anything in place unless you explicitly ask for it (most commonly through using a function ending in !, as there is a convention to end functions that modify their arguments with an exclamation mark. This is not enforced by the compiler, i.e. modifying functions don’t have to end in ! and those that do don’t necessarily modify their arguments - it’s just a convention).

So most operations on arrays (a = b + c where b and c are vectors) will allocate the result, which is bound to the name a. If a is already bound to another array (I’ll call it _a so we have a name), _a is not modified in place - this would be done with a .= b + c, which copies the result to _a. However, this still allocates the result of b + c. To combine the loop doing addition inside + with the loop copying to a, add another dot: a .= b .+ c (or equivalently, @. a = b + c, which places dots at all function calls and assignments).

In general, allocations of any mutable datastructures are not elided the vast majority of the time, whereas immutable structures may be eliminated, if they don’t escape the function (i.e. leave its scope). This applies to both julia “internal” types as well as to user types. There’s generally no distinction between user & inbuilt types though - they’re equally powerful.

Well they are not eliminated while running the function - the memory has to be used at some point. If you accidentally leak some memory from a callee into a caller, that memory of course can’t be freed - it might be used in the caller after all.

1 Like

You are right. They are defined outside the outer while loop and updated inside. I’ll edit that.

I understand that. Right now, the function is allocating 750mb of memory when it is running, which is high, but not a problem. I’ve managed to cut that down from the original 4.5GiB that it was taking initially by using your advice on updating arrays in place inside the statistics function.

Still, when model_wrapper! ends, it was my understanding that all variables and arrays that don’t “return” are eliminated from memory. However, as I run model_wrapper! several times in order to try out different parameter combinations for the non-linear function, I find that the RAM of my computer keeps filling up. When I look up the RAM usage of Julia and Juno in the task manager they hold steady at 44GiB of memory allocated, even as the total allocation is nearing 256GiB. However, when I close the Juno window, RAM usage returns to normal levels. I just don’t understand whats going on.

Here are pictures of my task manager as it stands. The memory usage will increase as I run model_wrapper! (independent runs), but it does not show up as Julia memory usage. Could this be a Juno problem?


Try calling GC.gc() between every iteration

While they may not be eliminated immediately (GC may hold on to the memory for a while until it gets a chance to run collection) it should definitely be freed after some time. You could insert an explicit GC.gc() to see if explicitly invoking would help… I’ve also found this issue from 0.6.2 way back when, which may be related. Maybe this as well.

I’m not sure how Juno and Julia interact, but what I do know is that Juno is, as far as I’m aware, no longer actively maintained and rather VSCode is the main endorsed development environment.

Still, explicitly calling GC.gc() may help.

Thanks for the references. One of the issues could be the shared arrays. But isnt GC for garbage collection within functions? The problem seems to be after one has run the model_wrapper function many times (independent runs). Could it be that the output is a named tuple that is not being updated in place?

So if I do the following:

output = model_wrapper!(pa,pr,q_ret,q_act)
output = model_wrapper!(pa,pr,q_ret,q_act)

I end up with 2 separate memory allocations. Is there a way to update “output” (a named tuple) in place? I’ve tried “.=”, but I get an error:

broadcasting over dictionaries and NamedTuples is reserved

Would I need to do this separately for each of the elements of the named tuple “output”?

1 Like

GC works on all levels, it’s just a matter of what you as the programmer keep accessible in scope. If you can still reach it, GC can’t free it.

If you literally do output = model_wrapper!(..) and don’t keep a reference to the old output around (e.g. by pushing the results to an array or similar), GC will clean up. If you keep it accessible by giving it another name or saving it inside some other variable, GC can’t clean it up.

You run your model twice, so of course both invocations of model_wrapper! will allocate whatever they need in terms of memory inside of themselves. The way around that is passing everything you want to mutate into model_wrapper! to prevent it from allocating itself.

There is no magic auto-update of existing memory unless you explicitly ask for it. What you’ve written is an assignment of the result of model_wrapper! to the binding/variable name output. It does not automatically overwrite the contents of an existing output (and can’t, because NamedTuple are immutable anyway).

julia> a = (a=2,b=4)                                                                
(a = 2, b = 4)                                                                      
                                                                                    
julia> a.a = 4                                                                      
ERROR: setfield!: immutable struct of type NamedTuple cannot be changed             
Stacktrace:                                                                         
 [1] setproperty!(x::NamedTuple{(:a, :b), Tuple{Int64, Int64}}, f::Symbol, v::Int64)
   @ Base ./Base.jl:43                                                              
 [2] top-level scope                                                                
   @ REPL[9]:1                                                                      

That’s hard to say without seeing how model_wrapper! is used in your code. I suspect you’re keeping references to old instances around unintentionally, which clog your memory. If you can share how this function is used, there may be more information to share.

2 Likes

For those of you struggling with a similar problem. I have applied some of the potential solutions proposed here, which do reduce the size of allocations. However, they do not solve the core problem of high RAM memory usage even after the functions are done running. This issue might be related, so when its solved it can provide a solution to this problem.

Do you have the same issue when your are NOT using vscode, just the REPL?

Sorry for the late reply. Thanks for the suggestion. I did and it made no difference.

For those who may be struggling with similar issues, I solved my problem. The problem was with using SharedArrays. In my example code, I have a function model_compute, which is iterated on and creates SharedArrays each time it is run. Unlike Arrays, these do not seem to be eliminated once the function has run, which means they continue to occupy space in memory. Therefore, I create the SharedArrays once, pass them to the function model_compute!, which updates them in place, as per @Sukera’s suggestion:

This takes care of the RAM problem.

2 Likes

Not sure that’s how it’s intended to be - the docs at least don’t mention anything of the sort. It might just be that GC didn’t run immediately. Do you have a smaller example where you can observe this behavior?

I hesitated before making such a claim, but I can tell you that this is the only thing that I changed in the code. I will attempt to produce a smaller example when I have the time

1 Like