Memory allocation within loop

question
#1

I have a for loop in which I create a function that I optimize, something like:

using Interpolations, Optim
function PolicyFunction()
    V = Array{Float64}(undef, 201, 40)
    n = Array{Float64}(undef, 201, 40)

    grid_h = range(10.0, stop = 300, length = 201)

    for j = 40:-1:1
        if j == 40 # Last period
            nextV = zeros(201)
        else
            nextV = V[:,j+1]
        end

        # Linerar interpolation for this period
        MyItp = interpolate((grid_h,), nextV, Gridded(Linear()))
        V_tw = extrapolate(MyItp, Line())
        for (ind_h, h) in enumerate(grid_h)
            # Define aux function to maximise
            aux_f(n_td) = log(1-n_td) + (0.95*V_tw(n_td^2.0))
            result = optimize(aux_f, 0.01, 0.99, GoldenSection())

            Optim.converged(result) || error("Failed to converge")
            n[ind_h, j] = Optim.minimizer(result)
            V[ind_h, j] = Optim.minimum(result)
        end
    end
    return n
end

With this implementation I get:

@time PolicyFunction()
  0.106507 seconds (1.68 M allocations: 32.176 MiB, 3.05% gc time)

I’m happy with the time but I was surprised with the memory allocation (in my real application I get something like 148.13 M allocations: 2.238 GiB, 2.08% gc time. When I use a more rudimentary solution like:

function PolicyFunction()
    V = Array{Float64}(undef, 201, 40)
    n = Array{Float64}(undef, 201, 40)

    grid_h = range(10.0, stop = 300, length = 201)

    for j = 40:-1:1
        # Define vector for next period
        if j == 40 # Last period
            nextV = zeros(201)
        else
            nextV = V[:,j+1]
        end

        # Linerar interpolation for this period
        MyItp = interpolate((grid_h,), nextV, Gridded(Linear()))
        V_tw = extrapolate(MyItp, Line())

        for (ind_h, h) in enumerate(grid_h)
            V_max = - Inf
            for n_td in range(0.01, stop = 0.99, length = 201)
                V_td = log(1-n_td) + (0.95*V_tw(n_td^2.0))

                if V_td > V_max
                    V_max = V_td
                    n[ind_h, j] = n_td
                    V[ind_h, j] = V_td
                end
            end
        end
    end
    return n
end

The time increases (of course depends on the grid points for n_td) but the use of memory is much lower (in my real application the difference is even more dramatic):

0.195065 seconds (150.49 k allocations: 7.371 MiB, 1.22% gc time)

Can anybody explain why is there such a difference in memory allocation? Is it because I create a function in each iteration? Is there a way around it?

#2

Optim probably allocates some work buffers in optimize.

If you are happy with the time, do you really care about how many allocations that occurred?

1 Like
#3

I get similar performance when I use my own implementation of the golden section.

I was thinking that this high use of memory might mask a problem somewhere or inefficient coding…

Edit: This function is part of a bigger iteration and ends up being called many times.

#4

Yeah, that is sometimes the case. The only way to really know is to do some profiling and try to see where time is spent. And then you could try optimize that and see if reducing allocations help with performance.

1 Like
#5

Any tutorial on how to do profiling?

Thanks!

#6

There are a few profiling options, depending on your feelings about GUIs. Julia’s built-in profiling tools will print a text-based function call tree annotated with the number of times each function was called:

using Profile
Profile.clear()
@profile PolicyFunction()
Profile.print()

ProfileView gives you a graphical view of the tree, and Juno’s built-in @profiler macro (my preferred method) gives a view of the tree as well as a histogram of call frequency overlaid on the lines of your script.

1 Like
#7

Optim creates a new ResultSet for every optimization that you do, it does not reuse it. This leads to large memory allocations when your optimization is in your inner loop like this.

However! From what I remember, I did hack together my own univariate optimizer exactly modeled off the one in the package but that didn’t create structs for every run. Memory allocations decreased, but the performance was really not much different (again, just my recollection). I think they just get garbage collected fairly efficiently or something.

1 Like