How to delete a variable from memory in Julia?

Below is a very memory and performance intensive part of my code which aims to a system of non-linear equations via tensor contractions.

Performance and memory intensive code
function calculate_residual_memeff(T2)
    # g_vvoo,g_voov,g_vovo,g_oovv,g_oooo,g_vvvv= deserialize("g_vvoo.jlbin"),deserialize("g_voov.jlbin"),deserialize("g_vovo.jlbin"),deserialize("g_oovv.jlbin"),deserialize("g_oooo.jlbin"),deserialize("g_vvvv.jlbin")
    g_vvoo::Array{Float64,4} = deserialize("g_vvoo.jlbin")
    g_voov::Array{Float64,4} = deserialize("g_voov.jlbin")
    g_vovo::Array{Float64,4} = deserialize("g_vovo.jlbin")
    g_oovv::Array{Float64,4} = deserialize("g_oovv.jlbin")
    g_oooo::Array{Float64,4} = deserialize("g_oooo.jlbin")
    g_vvvv::Array{Float64,4} = deserialize("g_vvvv.jlbin")
    fvv::Array{Float64,2} , foo::Array{Float64,2} = deserialize("fvv.jlbin"),deserialize("foo.jlbin")
    nv::Int64 = deserialize("nv.jlbin")
    nocc::Int64 = deserialize("nocc.jlbin")
    R2u::Array{Float64,4} = zeros(Float64,nv,nv,nocc,nocc)
    R2::Array{Float64,4} = zeros(Float64,nv,nv,nocc,nocc)
    @tensor begin
        Trm1[a_1,a_2,i_1,i_2] := 0.5* g_vvoo[a_1,a_2,i_1,i_2]
        R2u[a_1,a_2,i_1,i_2] += Trm1[a_1,a_2,i_1,i_2]
        # @notensor Trm1 = nothing
        # @notensor g_vvoo = nothing
        Trm2[a_1,a_2,i_1,i_2] := - g_voov[a_1,i_3,i_1,a_3] * T2[a_2,a_3,i_3,i_2]
        R2u[a_1,a_2,i_1,i_2] += Trm2[a_1,a_2,i_1,i_2]
        # @notensor Trm2 = nothing
        Trm3[a_1,a_2,i_1,i_2] := - g_vovo[a_2,i_3,a_3,i_2] * T2[a_1,a_3,i_1,i_3]
        R2u[a_1,a_2,i_1,i_2] += Trm3[a_1,a_2,i_1,i_2]
        # @notensor Trm3 = nothing
        Trm4[a_1,a_2,i_1,i_2] := - g_vovo[a_2,i_3,a_3,i_1] * T2[a_1,a_3,i_3,i_2]
        R2u[a_1,a_2,i_1,i_2] += Trm4[a_1,a_2,i_1,i_2]
        # @notensor Trm4 = nothing
        # @notensor g_vovo = nothing
        Trm5[a_1,a_2,i_1,i_2] := 2*g_voov[a_1,i_3,i_1,a_3] * T2[a_2,a_3,i_2,i_3]
        R2u[a_1,a_2,i_1,i_2] += Trm5[a_1,a_2,i_1,i_2]
        # @notensor Trm5 = nothing
        # @notensor g_voov = nothing
        Trm6[a_1,a_2,i_1,i_2] := 0.5*g_oooo[i_3,i_4,i_1,i_2] * T2[a_1,a_2,i_3,i_4]
        R2u[a_1,a_2,i_1,i_2] += Trm6[a_1,a_2,i_1,i_2]
        # @notensor Trm6 = nothing
        # @notensor g_oooo = nothing
        Trm7[a_1,a_2,i_1,i_2] := fvv[a_2,a_3] * T2[a_1,a_3,i_1,i_2]
        R2u[a_1,a_2,i_1,i_2] += Trm7[a_1,a_2,i_1,i_2]
        # @notensor Trm7 = nothing
        # @notensor fvv = nothing
        Trm8[a_1,a_2,i_1,i_2] := + 0.5*g_vvvv[a_1,a_2,a_3,a_4] * T2[a_3,a_4,i_1,i_2]
        R2u[a_1,a_2,i_1,i_2] += Trm8[a_1,a_2,i_1,i_2]
        # @notensor Trm8 = nothing
        # @notensor g_vvvv = nothing
        Trm9[a_1,a_2,i_1,i_2] := - foo[i_3,i_2] * T2[a_1,a_2,i_1,i_3]
        R2u[a_1,a_2,i_1,i_2] += Trm9[a_1,a_2,i_1,i_2]
        # @notensor Trm9 = nothing
        # @notensor foo = nothing
        B1[i_4,a_4,a_1,i_1] := 2*(g_oovv[i_3,i_4,a_3,a_4] * T2[a_1,a_3,i_1,i_3])
        R2u[a_1,a_2,i_1,i_2] +=  B1[i_4,a_4,a_1,i_1] * T2[a_2,a_4,i_2,i_4]- B1[i_4,a_4,a_1,i_1] * T2[a_2,a_4,i_4,i_2]
        # @notensor B1 = nothing
        B2[i_4,a_4,a_1,i_1] := 0.5*(g_oovv[i_3,i_4,a_3,a_4] * T2[a_1,a_3,i_3,i_1])
        R2u[a_1,a_2,i_1,i_2] += B2[i_4,a_4,a_1,i_1] * T2[a_2,a_4,i_4,i_2]
        # @notensor B2 = nothing
        B3[i_4,i_2] := 2*(g_oovv[i_3,i_4,a_3,a_4] * T2[a_3,a_4,i_3,i_2])
        R2u[a_1,a_2,i_1,i_2] +=  -B3[i_4,i_2] * T2[a_1,a_2,i_1,i_4]
        # @notensor B3 = nothing
        B4[i_4,a_3,a_1,i_1] := (g_oovv[i_3,i_4,a_3,a_4] * T2[a_1,a_4,i_1,i_3])
        R2u[a_1,a_2,i_1,i_2] += -B4[i_4,a_3,a_1,i_1] * T2[a_2,a_3,i_2,i_4] + B4[i_4,a_3,a_1,i_1] * T2[a_2,a_3,i_4,i_2]
        # @notensor B4 = nothing
        B5[a_3,a_2] := (g_oovv[i_3,i_4,a_3,a_4] * T2[a_2,a_4,i_4,i_3])
        R2u[a_1,a_2,i_1,i_2] += B5[a_3,a_2] * T2[a_1,a_3,i_1,i_2]
        # @notensor B5 = nothing
        B6[i_4,i_2] := (g_oovv[i_3,i_4,a_3,a_4] * T2[a_3,a_4,i_2,i_3])
        R2u[a_1,a_2,i_1,i_2] += B6[i_4,i_2] * T2[a_1,a_2,i_1,i_4]
        # @notensor B6 = nothing
        B7[a_4,a_2] := 2*(g_oovv[i_3,i_4,a_3,a_4] * T2[a_2,a_3,i_4,i_3])
        R2u[a_1,a_2,i_1,i_2] += - B7[a_4,a_2] * T2[a_1,a_4,i_1,i_2]
        # @notensor B7 = nothing
        B8[i_4,a_3,a_1,i_2] := 0.5*(g_oovv[i_3,i_4,a_3,a_4] * T2[a_1,a_4,i_3,i_2])
        R2u[a_1,a_2,i_1,i_2] +=  +B8[i_4,a_3,a_1,i_2] * T2[a_2,a_3,i_4,i_1]
        # @notensor B8 = nothing
        B9[i_3,i_4,i_1,i_2] := 0.5* (g_oovv[i_3,i_4,a_3,a_4] * T2[a_3,a_4,i_1,i_2])
        R2u[a_1,a_2,i_1,i_2] += + B9[i_3,i_4,i_1,i_2] * T2[a_1,a_2,i_3,i_4]
        # @notensor B9 = nothing
        # @notensor g_oovv = nothing
        R2[a,b,i,j] := R2u[a,b,i,j] + R2u[b,a,j,i]
        # @notensor R2u = nothing
    end
    return R2
end

I want to write a type stable code that is also memory efficient. For example, after the variable Trm1 has been used once, it is no longer needed. So I want to delete the variable from memory and free the memory. One of the ways I read was to assign the variable as nothing and then manually run GC.gc(). However, this is causing type conversion problems as Trm1 has type Array{Float64,4} and thus it cannot be assigned to nothing. Is there any way to overcome this issue and free up the memory associated with Trm1 just after it’s work has ended?

if you split all such blocks (appears there are like 10 of them in your code) into individual functions, does the automatic garbage collector realize it? As you said the Trm1 stuff aren’t needed later so if they are only calculated inside a function the GC would remove them after the function returns.

3 Likes

I’m not familiar with whatever package you were using (and didn’t list them in your example), nor did you provide real or made-up data so we couldn’t run your example even if we knew. So my remarks are just based on guesses and casual inspection and may be wrong.

It looks like you don’t need all those different TrmX variables. You might be able to just reassign a single Trm in every one of your calculations. When it is reassigned, the old array becomes inaccessible and will be eligible for garbage collection. Alternatively, you could probably assign Trm1 = Float64[;;;;] once you’re done with it, which would allow the original to be GC’d. Although, as the above user mentioned, it’s not common to need to dismantle variables mid-function so you might think about breaking this into more functions, if you can make that make sense.

It’s not obviously clear why you even need the Trm variables. It looks like you could just combine those lines with the R2u updates and avoid the intermediate array entirely.

As a tangential, stylistic note, it would probably be a cleaner design if you separated the data loading and the math into separate functions.

2 Likes

The final expression to be calculated R2 needs a lot of tensor contractions. The TrmX variables are used to split any tensor contraction to at most a binary contraction.

And as once these binary contractions are computed we do not need the TrmX’s anymore, I am trying to free up the memory associated with it immediately after it has made it’s contributions to R2u.

Here also my aim is to not store the big tensors g_xxxx in the memory. I want to read these variables from the disk every time the function calculate_residuals_memeff(T2) is called. I guess the alternative you are suggesting would have been to read the data using some other function every time we need the calculate_residuals_memeff(T2) and then pass them as parameters. However I don’t see how that would help. It should be noted that we want to avoid keeping them in memory in any other parts of the code except calculate_residuals_memeff(T2) .

The relevant packages that are being used here are TensorOperations.jl , Serialization (Part of Base Julia). If you are interested in running the code yourself, the files that are being read can be found here. And you can use the values of T2.jlbin included in the link to call the function calculate_residuals_memeff(T2).

I have not tried that yet because that seems to be horribly tedious. Not only I would have to define 9 new functions for the TrmX but also 9 more for the BX variables.

The functions you would define if you follow @Datseris suggestion seem much less than what you say, because almost all operations seem repetitive so by passing some parameters you should be able to reduce the 18 function in like 3/4 or even less I think. This would also make the code clearer maybe. You can try with just one block initially to see if memory improves.

2 Likes

Are you certain this is exactly what’s happening? To me (and unless TensorOperations is somehow messing with this), it looks like Trm1 = nothing should be legal and the problem would be with g_vvoo = nothing because you type-declared it g_vvoo::Array{Float64,4} = deserialize("g_vvoo.jlbin"). If you instead did g_vvoo = deserialize("g_vvoo.jlbin")::Array{Float64,4}, it should be legal to later write g_vvoo = nothing.

If you’re concerned about peak memory usage, why do you load all your data up-front? You should be able to notably reduce your peak memory usage by loading each variable immediately before you use it, doing all your operations with it, and then disposing of it afterward (either by reaching the end of a function or by reassigning it). That actually seems like an easy organization of this into mutliple subfunctions: make each subfunction compute the contribution from a single g_XXXX. For example,

function compute_voov(T2::AbstractArray{<:Any, 4}, g_voov::AbstractArray{<:Any, 4})
    @tensor begin
        Trm[a_1,a_2,i_1,i_2] := - g_voov[a_1,i_3,i_1,a_3] * T2[a_2,a_3,i_3,i_2] # Trm2 term
        Trm[a_1,a_2,i_1,i_2] += 2*g_voov[a_1,i_3,i_1,a_3] * T2[a_2,a_3,i_2,i_3] # Trm5 term
    end
    return Trm
end

# add the voov component
@tensor R2u[a_1,a_2,i_1,i_2] += compute_voov(T2, deserialize("g_voov.jlbin")::Array{Float64, 4})[a_1,a_2,i_1,i_2]

By the time this line has completed, both the deserialized variable and the generated Trm are no longer accessible and can be GC’d.

You could also make a version that updates R2u directly, like

function add_voov!(R2u::AbstractArray{<:Any, 4}, T2::AbstractArray{<:Any, 4}, g_voov::AbstractArray{<:Any, 4})
    @tensor begin
        R2u[a_1,a_2,i_1,i_2] += - g_voov[a_1,i_3,i_1,a_3] * T2[a_2,a_3,i_3,i_2] # Trm2 term
        R2u[a_1,a_2,i_1,i_2] += 2*g_voov[a_1,i_3,i_1,a_3] * T2[a_2,a_3,i_2,i_3] # Trm5 term
    end
    return R2u
end

but I don’t anticipate this making a notable difference to performance.

1 Like

If you read the section on performance in the Julia manual, I believe that you will not find any suggestions to do what you are trying. This is something you simply leave to the GC. Instead, avoid unnecessarily allocating objects in the first place.

1 Like

I tried to do as you suggested and split up the the computation part into several mini functions whose entire job is to calculate the intermediate terms and them return to the main function.

Main Computation
function calcresnew(T2::Array{Float64,4})
    nv = deserialize("nv.jlbin")
    nocc = deserialize("nocc.jlbin")
    R2u = zeros(Float64,nv,nv,nocc,nocc)
    R2 =  zeros(Float64,nv,nv,nocc,nocc)
    R2u = addres_gvvoo(R2u,nv,nocc)
    R2u = addres_gvoov(R2u,nv,nocc,T2)
    R2u = addres_gvovo(R2u,nv,nocc,T2)
    R2u = addres_goooo(R2u,nv,nocc,T2)
    R2u = addres_gvvvv(R2u,nv,nocc,T2)
    R2u = addres_fvv(R2u,nv,nocc,T2)
    R2u = addres_foo(R2u,nv,nocc,T2)
    # R2u = addres_goovv(R2u,nv,nocc,T2)  #In this all the BX terms are still in memory at once
    R2u = addres_goovvb1(R2u,nv,nocc,T2)
    R2u = addres_goovvb2(R2u,nv,nocc,T2)
    R2u = addres_goovvb3(R2u,nv,nocc,T2)
    R2u = addres_goovvb4(R2u,nv,nocc,T2)
    R2u = addres_goovvb5(R2u,nv,nocc,T2)
    R2u = addres_goovvb6(R2u,nv,nocc,T2)
    R2u = addres_goovvb7(R2u,nv,nocc,T2)
    R2u = addres_goovvb8(R2u,nv,nocc,T2)
    R2u = addres_goovvb9(R2u,nv,nocc,T2)
    @tensor R2[a,b,i,j] += R2u[a,b,i,j] + R2u[b,a,j,i]
    return R2
end
Mini functions created to reduce memory usage
function addres_gvvoo(R2u,nv,nocc)
    g_vvoo = deserialize("g_vvoo.jlbin")
    @tensor Trm1[a_1,a_2,i_1,i_2] := 0.5* g_vvoo[a_1,a_2,i_1,i_2]
    @tensor R2u[a_1,a_2,i_1,i_2] += Trm1[a_1,a_2,i_1,i_2]
    return R2u
end

function addres_gvoov(R2u,nv,nocc,T2)
    g_voov = deserialize("g_voov.jlbin")
    @tensor Trm2[a_1,a_2,i_1,i_2] := - g_voov[a_1,i_3,i_1,a_3] * T2[a_2,a_3,i_3,i_2]
    @tensor R2u[a_1,a_2,i_1,i_2] += Trm2[a_1,a_2,i_1,i_2]
    @tensor Trm5[a_1,a_2,i_1,i_2] := 2*g_voov[a_1,i_3,i_1,a_3] * T2[a_2,a_3,i_2,i_3]
    @tensor R2u[a_1,a_2,i_1,i_2] += Trm5[a_1,a_2,i_1,i_2]
    return R2u
end

function addres_gvovo(R2u,nv,nocc,T2)
    g_vovo = deserialize("g_vovo.jlbin")
    @tensor Trm3[a_1,a_2,i_1,i_2] := - g_vovo[a_2,i_3,a_3,i_2] * T2[a_1,a_3,i_1,i_3]
    @tensor R2u[a_1,a_2,i_1,i_2] += Trm3[a_1,a_2,i_1,i_2]
    @tensor Trm4[a_1,a_2,i_1,i_2] := - g_vovo[a_2,i_3,a_3,i_1] * T2[a_1,a_3,i_3,i_2]
    @tensor R2u[a_1,a_2,i_1,i_2] += Trm4[a_1,a_2,i_1,i_2]
    return R2u
end

function addres_goooo(R2u,nv,nocc,T2)
    g_oooo = deserialize("g_oooo.jlbin")
    @tensor Trm6[a_1,a_2,i_1,i_2] := 0.5*g_oooo[i_3,i_4,i_1,i_2] * T2[a_1,a_2,i_3,i_4]
    @tensor R2u[a_1,a_2,i_1,i_2] += Trm6[a_1,a_2,i_1,i_2]
    return R2u
end

function addres_gvvvv(R2u,nv,nocc,T2)
    g_vvvv = deserialize("g_vvvv.jlbin")
    @tensor Trm8[a_1,a_2,i_1,i_2] := + 0.5*g_vvvv[a_1,a_2,a_3,a_4] * T2[a_3,a_4,i_1,i_2]
    @tensor R2u[a_1,a_2,i_1,i_2] += Trm8[a_1,a_2,i_1,i_2]
    return R2u
end

function addres_fvv(R2u,nv,nocc,T2)
    fvv = deserialize("fvv.jlbin")
    @tensor Trm7[a_1,a_2,i_1,i_2] := fvv[a_2,a_3] * T2[a_1,a_3,i_1,i_2]
    @tensor R2u[a_1,a_2,i_1,i_2] += Trm7[a_1,a_2,i_1,i_2]
    return R2u
end

function addres_foo(R2u,nv,nocc,T2)
    foo = deserialize("foo.jlbin")
    @tensor Trm9[a_1,a_2,i_1,i_2] := - foo[i_3,i_2] * T2[a_1,a_2,i_1,i_3]
    @tensor R2u[a_1,a_2,i_1,i_2] += Trm9[a_1,a_2,i_1,i_2]
    return R2u
end

function addres_goovv(R2u,nv,nocc,T2)
    g_oovv = deserialize("g_oovv.jlbin")
    @tensor B1[i_4,a_4,a_1,i_1] := 2*(g_oovv[i_3,i_4,a_3,a_4] * T2[a_1,a_3,i_1,i_3])
    @tensor R2u[a_1,a_2,i_1,i_2] +=  B1[i_4,a_4,a_1,i_1] * T2[a_2,a_4,i_2,i_4]- B1[i_4,a_4,a_1,i_1] * T2[a_2,a_4,i_4,i_2]
    @tensor B2[i_4,a_4,a_1,i_1] := 0.5*(g_oovv[i_3,i_4,a_3,a_4] * T2[a_1,a_3,i_3,i_1])
    @tensor R2u[a_1,a_2,i_1,i_2] += B2[i_4,a_4,a_1,i_1] * T2[a_2,a_4,i_4,i_2]
    @tensor B3[i_4,i_2] := 2*(g_oovv[i_3,i_4,a_3,a_4] * T2[a_3,a_4,i_3,i_2])
    @tensor R2u[a_1,a_2,i_1,i_2] +=  -B3[i_4,i_2] * T2[a_1,a_2,i_1,i_4]
    @tensor B4[i_4,a_3,a_1,i_1] := (g_oovv[i_3,i_4,a_3,a_4] * T2[a_1,a_4,i_1,i_3])
    @tensor R2u[a_1,a_2,i_1,i_2] += -B4[i_4,a_3,a_1,i_1] * T2[a_2,a_3,i_2,i_4] + B4[i_4,a_3,a_1,i_1] * T2[a_2,a_3,i_4,i_2]
    @tensor B5[a_3,a_2] := (g_oovv[i_3,i_4,a_3,a_4] * T2[a_2,a_4,i_4,i_3])
    @tensor R2u[a_1,a_2,i_1,i_2] += B5[a_3,a_2] * T2[a_1,a_3,i_1,i_2]
    @tensor B6[i_4,i_2] := (g_oovv[i_3,i_4,a_3,a_4] * T2[a_3,a_4,i_2,i_3])
    @tensor R2u[a_1,a_2,i_1,i_2] += B6[i_4,i_2] * T2[a_1,a_2,i_1,i_4]
    @tensor B7[a_4,a_2] := 2*(g_oovv[i_3,i_4,a_3,a_4] * T2[a_2,a_3,i_4,i_3])
    @tensor R2u[a_1,a_2,i_1,i_2] += - B7[a_4,a_2] * T2[a_1,a_4,i_1,i_2]
    @tensor B8[i_4,a_3,a_1,i_2] := 0.5*(g_oovv[i_3,i_4,a_3,a_4] * T2[a_1,a_4,i_3,i_2])
    @tensor R2u[a_1,a_2,i_1,i_2] +=  +B8[i_4,a_3,a_1,i_2] * T2[a_2,a_3,i_4,i_1]
    @tensor B9[i_3,i_4,i_1,i_2] := 0.5* (g_oovv[i_3,i_4,a_3,a_4] * T2[a_3,a_4,i_1,i_2])
    @tensor R2u[a_1,a_2,i_1,i_2] += + B9[i_3,i_4,i_1,i_2] * T2[a_1,a_2,i_3,i_4]
end

function addres_goovvb1(R2u,nv,nocc,T2)
    g_oovv = deserialize("g_oovv.jlbin")
    @tensor B1[i_4,a_4,a_1,i_1] := 2*(g_oovv[i_3,i_4,a_3,a_4] * T2[a_1,a_3,i_1,i_3])
    @tensor R2u[a_1,a_2,i_1,i_2] +=  B1[i_4,a_4,a_1,i_1] * T2[a_2,a_4,i_2,i_4]- B1[i_4,a_4,a_1,i_1] * T2[a_2,a_4,i_4,i_2]
    return R2u
end

function addres_goovvb2(R2u,nv,nocc,T2)
    g_oovv = deserialize("g_oovv.jlbin")
    @tensor B2[i_4,a_4,a_1,i_1] := 0.5*(g_oovv[i_3,i_4,a_3,a_4] * T2[a_1,a_3,i_3,i_1])
    @tensor R2u[a_1,a_2,i_1,i_2] += B2[i_4,a_4,a_1,i_1] * T2[a_2,a_4,i_4,i_2]
    return R2u
end

function addres_goovvb3(R2u,nv,nocc,T2)
    g_oovv = deserialize("g_oovv.jlbin")
    @tensor B3[i_4,i_2] := 2*(g_oovv[i_3,i_4,a_3,a_4] * T2[a_3,a_4,i_3,i_2])
    @tensor R2u[a_1,a_2,i_1,i_2] +=  -B3[i_4,i_2] * T2[a_1,a_2,i_1,i_4]
    return R2u
end

function addres_goovvb4(R2u,nv,nocc,T2)
    g_oovv = deserialize("g_oovv.jlbin")
    @tensor B4[i_4,a_3,a_1,i_1] := (g_oovv[i_3,i_4,a_3,a_4] * T2[a_1,a_4,i_1,i_3])
    @tensor R2u[a_1,a_2,i_1,i_2] += -B4[i_4,a_3,a_1,i_1] * T2[a_2,a_3,i_2,i_4] + B4[i_4,a_3,a_1,i_1] * T2[a_2,a_3,i_4,i_2]
    return R2u
end

function addres_goovvb5(R2u,nv,nocc,T2)
    g_oovv = deserialize("g_oovv.jlbin")
    @tensor B5[a_3,a_2] := (g_oovv[i_3,i_4,a_3,a_4] * T2[a_2,a_4,i_4,i_3])
    @tensor R2u[a_1,a_2,i_1,i_2] += B5[a_3,a_2] * T2[a_1,a_3,i_1,i_2]
    return R2u
end

function addres_goovvb6(R2u,nv,nocc,T2)
    g_oovv = deserialize("g_oovv.jlbin")
    @tensor B6[i_4,i_2] := (g_oovv[i_3,i_4,a_3,a_4] * T2[a_3,a_4,i_2,i_3])
    @tensor R2u[a_1,a_2,i_1,i_2] += B6[i_4,i_2] * T2[a_1,a_2,i_1,i_4]
    return R2u
end

function addres_goovvb7(R2u,nv,nocc,T2)
    g_oovv = deserialize("g_oovv.jlbin")
    @tensor B7[a_4,a_2] := 2*(g_oovv[i_3,i_4,a_3,a_4] * T2[a_2,a_3,i_4,i_3])
    @tensor R2u[a_1,a_2,i_1,i_2] += - B7[a_4,a_2] * T2[a_1,a_4,i_1,i_2]
    return R2u
end

function addres_goovvb8(R2u,nv,nocc,T2)
    g_oovv = deserialize("g_oovv.jlbin")
    @tensor B8[i_4,a_3,a_1,i_2] := 0.5*(g_oovv[i_3,i_4,a_3,a_4] * T2[a_1,a_4,i_3,i_2])
    @tensor R2u[a_1,a_2,i_1,i_2] +=  +B8[i_4,a_3,a_1,i_2] * T2[a_2,a_3,i_4,i_1]
    return R2u
end

function addres_goovvb9(R2u,nv,nocc,T2)
    g_oovv = deserialize("g_oovv.jlbin")
    @tensor B9[i_3,i_4,i_1,i_2] := 0.5* (g_oovv[i_3,i_4,a_3,a_4] * T2[a_3,a_4,i_1,i_2])
    @tensor R2u[a_1,a_2,i_1,i_2] += + B9[i_3,i_4,i_1,i_2] * T2[a_1,a_2,i_3,i_4]
    return R2u
end

The function addres_goovv(..) has been further split into nine parts so that the Bx terms are never in the memory at the same time. Is there anything else that you would suggest can be done?

It is not necessary or useful to pass nv or nocc to all your helper functions. You never use either within them. Also, I’m faily sure that deserialized variables are type-unstable (and you’ve dropped any annotations that made them stable here). You should probably annotate every deserialize call with a type annotation to restore type certainty. For example, write nv = deserialize("nv.jlbin")::Int, g_vvoo = deserialize("g_vvoo.jlbin")::Array{Float64, 4}.

It seems like clumsy design to load data within each of these computational helper functions. What if you already have the data (you already loaded it or freshly computed it)? I doubt it would apply here (after type-asserting the deserialize calls – but definitely if you don’t!), but this also makes it easier to take advantage of this performance tip. I’ll encourage you to look again at my example functions above to see this pattern. Note that this would make the type annotations less necessary also, but they would still help.

In some situations, it seems that you create an intermediate variable without need. You should benchmark it, but I am fairly sure TensorOperations would run with similar-or-better speed and less memory use with

function addres_gvvoo(R2u,nv,nocc)
    # note: I recommend you load the data outside of this function
    g_vvoo = deserialize("g_vvoo.jlbin")::Array{Float64, 4} # this annotation is probably significant
    @tensor R2u[a_1,a_2,i_1,i_2] += 0.5 * g_vvoo[a_1,a_2,i_1,i_2] # Trm1
    return R2u
end

It looks like your functions involving TrmX variables can be improved this way but your BX functions cannot.

Your broken-up goovvbX functions have the unecessary cost of repeatedly reading "g_oovv.jlbin" from memory. You should load that file once and then pass the loaded variable to each function, instead. Alternatively, your big addres_goovv would probably be fine if you simply changed all your BX variables to the same name B so that the old ones were definitely inaccessible. Or maybe (but I’m pretty skeptical) the compiler is smart enough to free those variables earlier. It might work if you used let to introduce scope like

let @tensor B1[i_4,a_4,a_1,i_1] := 2*(g_oovv[i_3,i_4,a_3,a_4] * T2[a_1,a_3,i_1,i_3])
    @tensor R2u[a_1,a_2,i_1,i_2] +=  B1[i_4,a_4,a_1,i_1] * T2[a_2,a_4,i_2,i_4]- B1[i_4,a_4,a_1,i_1] * T2[a_2,a_4,i_4,i_2]
end

but it make take some finagling to make that work with the @tensor macro (maybe let B1 = @tensor B1[... would make it work, if it doesn’t natively?).

I would suggest you end all these function names for functions that modify the passed R2u variable with ! (e.g., addres_gvovo!). This convention indicates that the function mutates an input. This can make reasoning about code considerably easier, so that you don’t forget that this is happening. Also, your R2u = addres_XXXX(R2u,...) would work equally as well as addres_XXXX(R2u,...) (i.e., without the R2u = ) because of this mutation that is happening. The variable is modified regardless of whether you return and reassign it (unlike in MATLAB or some other languages).

2 Likes

Assigning a variable to nothing is advice mainly meant for variables in the global scope.

For variables already in the local scope such as your function, the best approach is make it clear that the variable has gone out of scope. There are examples of let blocks above which introduce a nested local scope and make it clear to compiler that the variable may be garbage collected.

To force garbage collection, it may be necessary to invoke GC.gc() multiple times.

For a while the recommended number of times was four times:

Recent improvements have reduce that number to three times:

It is not clear to me that you actually should do this within your performance intensive part of your code unless you are running out of memory. You might want to run (GC.gc(); GC.gc(); GC.gc(); GC.gc()) immediately after running the function.

Here is a demonstration of how to track GC finalization on a specific variable.

julia> function foo()
           A = rand(8, 1024, 1024)
           finalizer(A) do _
               @async println("A is finalizing")
           end
           s =  sum(A)
           return s
       end
foo (generic function with 1 method)

julia> GC.gc()

julia> foo()
4.1942669190884633e6

julia> GC.gc()
A is finalizing

Now that we can monitor when A gets garbage collected, we see how many executions of foo() will occur before the Julia GC decides it is time to collect garbage.

julia> foo()
4.193281027214226e6

julia> foo()
4.194964042450877e6

julia> foo()
4.194420127904231e6

julia> foo()
4.1941064345511207e6

julia> foo()
4.194651283962144e6

julia> foo()
4.1936599618271324e6

julia> foo()
4.19254885724495e6

julia> foo()
4.193773437489507e6

julia> foo()
4.194397761604629e6

julia> foo()
4.194393810160885e6

julia> foo()
A is finalizing
A is finalizing
A is finalizing4.195333143475847e6


julia> 
julia> A is finalizing
A is finalizing
A is finalizing
A is finalizing
A is finalizing
A is finalizing

We see the garbage collector is a bit lazy. Each execution of foo() only allocates 64 MB of memory. There is no real need to run garbage collection until after multiple runs.

If we allocate more memory, garbage collection runs immediately.

julia> function foo()
           A = rand(1024, 1024, 1024)
           finalizer(A) do _
               @async println("A is finalizing")
           end
           s =  sum(A)
           return s
       end
foo (generic function with 1 method)

julia> foo()
A is finalizing
5.368781688997142e8

Similarly, we see that the let block has the same effect.

julia> let A = rand(1024, 1024, 1024)
           finalizer(A) do _
               @async println("A is finalizing")
           end
           s = sum(A)
       end
A is finalizing
5.368611167706951e8

If we do call GC.gc() after we are done using A completely, then it will also be collected.

julia> let A = rand(8, 1024, 1024)
           finalizer(A) do _
               @async println("A is finalizing")
           end
           s = sum(A)
           GC.gc()
           return s
       end
A is finalizing
4.1942064558408484e6
3 Likes

Shouldn’t the garbage collector run automatically and clear out the variables like TrmX once the helper function exits as there is no reference to that variable anymore ?

It will eventually, but only if required, which means only if Julia is running short of memory.

3 Likes