Reducing (time spent in) garbage collection

I’m using the code posted below to create a simple JuMP model with variable size. Depending on the size and the way I construct the problem this induces somewhere between 20-80% of the total time spent in garbage collection (with problems of a relevant size always spending > 80% in GC). For more detailed timings see this post.

To circumvent this I disabled the garbage collector and am trying to manually trigger it in constant intervals. This

  • reduces the time spent in GC to around 25% of total time
  • seems to mess up something since I run out of memory AFTER re-enabling the garbage collector, doing GC.gc(true) and trying to pass the model to a solver (with automatic GC this does not happen, so the model and all memory the solver needs easily fits into RAM normally)

Now I have found some posts regarding garbage collection (which did not really conclude in anything related) and close to no real information online on how it could be “influenced” from the outside (besides turning it on/off). Is there any way to parameterize the intervals it normally runs at, etc.?

Is there anything “non JuMP related” that I can do to improve this? Is there any efficient way to track the current amount of memory allocation; and is it possible to use this to trigger a full GC only when it is really necessary?

Basically I am looking to reduce total computational time by reducing time spent doing garbage collection to a minimum. If I know that I have 512 GB of RAM I could risk using 500 of those and only then triggering a single GC until I hit that mark again.

Current code that is used to construct the JuMP model with garbage collection every 100 “blocks”:

using JuMP
using GLPK

"""
	Create a simple JuMP model with a specified size
"""
function create_model(size)
    T = 1:10000
    T2 = 2:10000
	
	GC.gc(true)
	GC.enable(false)
	model = JuMP.direct_model(GLPK.Optimizer())
	set_time_limit_sec(model, 0.001)

	for start in 1:100:size
		I = start:(start+99)
		x = @variable(model, [I, T], lower_bound=0, upper_bound=100)
		@constraint(model, [i in I, t in T2], x[i, t] - x[i, t-1] <= 10)
		@constraint(model, [i in I, t in T2], x[i, t-1] - x[i, t] <= -10)
		@objective(model, Min, sum(x[i, t] for t in T for i in I))
		
		GC.enable(true)
		GC.gc(false)
		GC.enable(false)
	end
	
	GC.enable(true)
    return model
end


"""
	Solve the model (this actually just passes it to GLPK due to the timelimit)
"""
function solve_model!(model)
	JuMP.optimize!(model)
end


for i in [100, 200, 500]
    println("Size: $i")

    @time (model = create_model(i))
	@time (solve_model!(model))
end

Edit: changed using T[2:end] in the loop to preparing T2 = 2:10000 and using that instead, as correctly suggested by @Jeff_Emanuel.

This makes a copy. Try using a view instead, or better, just initialize a T2 = 2:10000.

1 Like

Thanks for pointing out that stupid mistake, I’ve fixed that now (and I’ll edit it in in the initial post to not confuse future readers).

However, this does not change the timing or gc usage in a meaningful way for bigger problem sizes (n \geq 1000), with only a few seconds difference and still approximately 25% of the total time spent in garbage collection.