Memory consumption growth with many large MILP's in JuMP

Probably not helpful, since the many things already tried, but… I had one problem with memory scaling up in a parallel application and, initially, I used something like this as a workaround:

        if istaskdone(t[ispawn])
          if options.GC && (Sys.free_memory() / Sys.total_memory() < options.GC_threshold)
            GC.gc() # why we need this anyway??? There should not be so much garbage.
          end
        end

meaning that I launched garbage collection after each thread was finished whenever the memory usage was too high (but I had complete control over the code).

That being said (since you already tried things like that), I finally found out a type instability in my code which was the cause of that memory leak, and the overall problem was solved (it was a tricky one from the point of view of what I knew at the time). Thus, if you not have already exhausted this possibility, I would suggest to carefully see if everything is type stable where it should be.

3 Likes

Just saw this in the Julia manual in Multi-Threading Β· The Julia Language

Compute-bound, non-memory-allocating tasks can prevent garbage collection from running in other threads that are allocating memory. In these cases it may be necessary to insert a manual call to GC.safepoint() to allow GC to run. This limitation will be removed in the future.

1 Like

May be totally unrelated, but is it intended that all AxisArray fields in that struct are abstractly typed? What does @code_warntype of your job endpoint and reopt_run look like? I seem to remember there being problems with threaded type-unstable code leading to a lot of allocations…

1 Like

I will check that but we are seeing the memory leak in the knapsack.jl problem as well (which does not have any structs).

Thank you for the tip! However, our app is not doing the threading so we do not have control over the tasks. The solver (CPLEX or Xpress checked so far) is called by JuMP (or MathOptInterface?), and the solver uses multiple threads. Also, I do not think that it is a type instability leading to the memory leak because the memory leak occurs with the knapsack.jl problem (unless the type instability is in JuMP, MathOptInterface, or another dependency like MutableArithmetics).

The problem I had in my case was very similar to that one found by Sukera above (just to mention, I know that it is not related, at least completely, to your problem).

Maybe you can try to track if there is a type instability somewhere by following the function calls down the code. I don’t know if you know (I learned this not long time ago), you can use @code_warntype in the inner functions by calling Main.@code_warntype from anywhere. Something like:

function test(x)
   y = inner_function(x)
   return y
end

with:

julia> function test(x)
          Main.@code_warntype inner_function(x) # check inner_function call
          y = inner_function(x)
          return y
       end
test (generic function with 1 method)

Ok thank you I will try Main.@code_warntype on our inner function calls and report back what I find. I have never used it before so thank you for the example!

Should I be concerned by the many Anys? Or is there something specific to look for regarding the type stability issue and multi-threading?

Variables
  #self#::Core.Const(JuMP.var"@variable")
  __source__::LineNumberNode
  __module__::Module
  args@_4::Tuple{Symbol, Expr, Symbol}
  @_5::Int64
  #107::JuMP.var"#107#116"
  @_7::Any
  #106::JuMP.var"#106#115"
  #105::JuMP.var"#105#114"
  #104::JuMP.var"#104#113"
  @_11::Int64
  #103::JuMP.var"#103#112"
  #102::JuMP.var"#102#111"
  #101::JuMP.var"#101#110"
  #100::JuMP.var"#100#109"
  @_16::Int64
  macro_code::Expr
  creation_code::Expr
  variablecall::Expr
  buildcall::Expr
  scalar_variables::Expr
  name_code::Any
  indices::Expr
  idxvars::Vector{Any}
  info::Expr
  extra::Any
  set::Any
  base_name::Any
  name::Any
  variable::Symbol
  anonvar::Bool
  var::Any
  explicit_comparison::Bool
  infoexpr::JuMP._VariableInfoExpr
  set_kw_args::Any
  variable_type_kw_args::Any
  base_name_kw_args::Any
  extra_kw_args::Any
  info_kw_args::Any
  anon_singleton::Bool
  x::Any
  requestedcontainer::Any
  kw_args::Any
  model::Expr
  _error::JuMP.var"#_error#108"{LineNumberNode}
  ex::Any
  args@_47::Union{}
  args@_48::Union{}
  args@_49::Union{}
  args@_50::Union{}
  args@_51::Union{}
  args@_52::Union{}
  args@_53::Union{Core.Box, Tuple{Symbol, Expr, Symbol}}
  @_54::JuMP._VariableInfoExpr
  @_55::Bool
  @_56::Bool
  @_57::Bool
  @_58::Any
  @_59::Bool

Body::Expr
1 ───        (args@_53 = args@_4)
β”‚            (args@_53 = Core.Box(args@_53::Tuple{Symbol, Expr, Symbol}))
β”‚            Core.NewvarNode(:(@_5))
β”‚            Core.NewvarNode(:(#107))
β”‚            Core.NewvarNode(:(@_7))
β”‚            Core.NewvarNode(:(#106))
β”‚            Core.NewvarNode(:(#105))
β”‚            Core.NewvarNode(:(#104))
β”‚            Core.NewvarNode(:(@_11))
β”‚            Core.NewvarNode(:(#103))
β”‚            Core.NewvarNode(:(#102))
β”‚            Core.NewvarNode(:(#101))
β”‚            Core.NewvarNode(:(#100))
β”‚            Core.NewvarNode(:(@_16))
β”‚            Core.NewvarNode(:(macro_code))
β”‚            Core.NewvarNode(:(creation_code))
β”‚            Core.NewvarNode(:(variablecall))
β”‚            Core.NewvarNode(:(buildcall))
β”‚            Core.NewvarNode(:(scalar_variables))
β”‚            Core.NewvarNode(:(name_code))
β”‚            Core.NewvarNode(:(indices))
β”‚            Core.NewvarNode(:(idxvars))
β”‚            Core.NewvarNode(:(info))
β”‚            Core.NewvarNode(:(extra))
β”‚            Core.NewvarNode(:(set))
β”‚            Core.NewvarNode(:(base_name))
β”‚            Core.NewvarNode(:(name))
β”‚            Core.NewvarNode(:(variable))
β”‚            Core.NewvarNode(:(anonvar))
β”‚            Core.NewvarNode(:(var))
β”‚            Core.NewvarNode(:(explicit_comparison))
β”‚            Core.NewvarNode(:(infoexpr))
β”‚            Core.NewvarNode(:(set_kw_args))
β”‚            Core.NewvarNode(:(variable_type_kw_args))
β”‚            Core.NewvarNode(:(base_name_kw_args))
β”‚            Core.NewvarNode(:(extra_kw_args))
β”‚            Core.NewvarNode(:(info_kw_args))
β”‚            Core.NewvarNode(:(anon_singleton))
β”‚            Core.NewvarNode(:(x))
β”‚            Core.NewvarNode(:(requestedcontainer))
β”‚            Core.NewvarNode(:(kw_args))
β”‚            Core.NewvarNode(:(model))
β”‚     %43  = JuMP.:(var"#_error#108")::Core.Const(JuMP.var"#_error#108")
β”‚     %44  = Core.typeof(__source__)::Core.Const(LineNumberNode)
β”‚     %45  = Core.apply_type(%43, %44)::Core.Const(JuMP.var"#_error#108"{LineNumberNode})
β”‚            (_error = %new(%45, __source__, args@_53::Core.Box))
β”‚     %47  = Core.isdefined(args@_53::Core.Box, :contents)::Bool
└────        goto #3 if not %47
2 ───        goto #4
3 ───        Core.NewvarNode(:(args@_47))
└────        args@_47
4 ┄── %52  = Core.getfield(args@_53::Core.Box, :contents)::Any
β”‚     %53  = JuMP._reorder_parameters(%52)::Any
β”‚            Core.setfield!(args@_53::Core.Box, :contents, %53)
β”‚     %55  = Core.isdefined(args@_53::Core.Box, :contents)::Bool
└────        goto #6 if not %55
5 ───        goto #7
6 ───        Core.NewvarNode(:(args@_48))
└────        args@_48
7 ┄── %60  = Core.getfield(args@_53::Core.Box, :contents)::Any
β”‚     %61  = Base.getindex(%60, 1)::Any
β”‚            (model = JuMP.esc(%61))
β”‚     %63  = JuMP.Containers._extract_kw_args::Core.Const(JuMP.Containers._extract_kw_args)
β”‚     %64  = Core.isdefined(args@_53::Core.Box, :contents)::Bool
└────        goto #9 if not %64
8 ───        goto #10
9 ───        Core.NewvarNode(:(args@_49))
└────        args@_49
10 ┄─ %69  = Core.getfield(args@_53::Core.Box, :contents)::Any
β”‚     %70  = Core.isdefined(args@_53::Core.Box, :contents)::Bool
└────        goto #12 if not %70
11 ──        goto #13
12 ──        Core.NewvarNode(:(args@_50))
└────        args@_50
13 ┄─ %75  = Core.getfield(args@_53::Core.Box, :contents)::Any
β”‚     %76  = Base.lastindex(%75)::Any
β”‚     %77  = (2:%76)::Any
β”‚     %78  = Base.getindex(%69, %77)::Any
β”‚     %79  = (%63)(%78)::Tuple{Any, Any, Any}
β”‚     %80  = Base.indexed_iterate(%79, 1)::Core.PartialStruct(Tuple{Any, Int64}, Any[Any, Core.Const(2)])
β”‚            (extra = Core.getfield(%80, 1))
β”‚            (@_16 = Core.getfield(%80, 2))
β”‚     %83  = Base.indexed_iterate(%79, 2, @_16::Core.Const(2))::Core.PartialStruct(Tuple{Any, Int64}, Any[Any, Core.Const(3)])
β”‚            (kw_args = Core.getfield(%83, 1))
β”‚            (@_16 = Core.getfield(%83, 2))
β”‚     %86  = Base.indexed_iterate(%79, 3, @_16::Core.Const(3))::Core.PartialStruct(Tuple{Any, Int64}, Any[Any, Core.Const(4)])
β”‚            (requestedcontainer = Core.getfield(%86, 1))
β”‚     %88  = JuMP.length(extra)::Any
β”‚     %89  = (%88 == 0)::Any
└────        goto #15 if not %89
14 ──        (x = JuMP.gensym())
β”‚            (anon_singleton = true)
└────        goto #22
15 ──        (x = JuMP.popfirst!(extra))
β”‚     %95  = (x == :Int)::Union{Missing, Bool}
└────        goto #17 if not %95
16 ── %97  = Base.string("Ambiguous variable name ", x, " detected. To specify an anonymous integer ")::String
β”‚     %98  = (%97 * "variable, use `@variable(model, integer = true)` instead.")::String
β”‚            (_error)(%98)
└────        Core.Const(:(goto %111))
17 ┄─ %101 = (x == :Bin)::Union{Missing, Bool}
└────        goto #19 if not %101
18 ── %103 = Base.string("Ambiguous variable name ", x, " detected. To specify an anonymous binary ")::String
β”‚     %104 = (%103 * "variable, use `@variable(model, binary = true)` instead.")::String
β”‚            (_error)(%104)
└────        Core.Const(:(goto %111))
19 ┄─ %107 = (x == :PSD)::Union{Missing, Bool}
└────        goto #21 if not %107
20 ── %109 = ("Size of anonymous square matrix of positive semidefinite anonymous variables is not specified. To specify size of square matrix " * "use `@variable(model, [1:n, 1:n], PSD)` instead.")::String
└────        (_error)(%109)
21 ┄─        (anon_singleton = false)
22 ┄─        (info_kw_args = JuMP.filter(JuMP._is_info_keyword, kw_args))
β”‚            (#100 = %new(JuMP.:(var"#100#109")))
β”‚     %114 = #100::Core.Const(JuMP.var"#100#109"())
β”‚            (extra_kw_args = JuMP.filter(%114, kw_args))
β”‚            (#101 = %new(JuMP.:(var"#101#110")))
β”‚     %117 = #101::Core.Const(JuMP.var"#101#110"())
β”‚            (base_name_kw_args = JuMP.filter(%117, kw_args))
β”‚            (#102 = %new(JuMP.:(var"#102#111")))
β”‚     %120 = #102::Core.Const(JuMP.var"#102#111"())
β”‚            (variable_type_kw_args = JuMP.filter(%120, kw_args))
β”‚            (#103 = %new(JuMP.:(var"#103#112")))
β”‚     %123 = #103::Core.Const(JuMP.var"#103#112"())
β”‚            (set_kw_args = JuMP.filter(%123, kw_args))
β”‚     %125 = Base.NamedTuple()::Core.Const(NamedTuple())
β”‚     %126 = Base.broadcasted(JuMP._keywordify, info_kw_args)::Any
β”‚     %127 = Base.materialize(%126)::Any
β”‚     %128 = Base.merge(%125, %127)::Any
β”‚     %129 = Base.isempty(%128)::Any
└────        goto #24 if not %129
23 ──        (@_54 = JuMP._VariableInfoExpr())
└────        goto #25
24 ── %133 = Core.kwfunc(JuMP._VariableInfoExpr)::Core.Const(Core.var"#Type##kw"())
└────        (@_54 = (%133)(%128, JuMP._VariableInfoExpr))
25 ┄─        (infoexpr = @_54)
β”‚     %136 = JuMP.isexpr(x, :comparison)::Bool
└────        goto #27 if not %136
26 ──        (@_55 = %136)
└────        goto #28
27 ──        (@_55 = JuMP.isexpr(x, :call))
28 ┄─        (explicit_comparison = @_55)
└────        goto #30 if not explicit_comparison
29 ── %143 = Core.tuple(_error, infoexpr)::Tuple{JuMP.var"#_error#108"{LineNumberNode}, JuMP._VariableInfoExpr}
β”‚     %144 = Base.getproperty(x, :args)::Any
β”‚     %145 = Core._apply_iterate(Base.iterate, JuMP.parse_variable, %143, %144)::Tuple{Any, Any}
β”‚     %146 = Base.indexed_iterate(%145, 1)::Core.PartialStruct(Tuple{Any, Int64}, Any[Any, Core.Const(2)])
β”‚            (var = Core.getfield(%146, 1))
β”‚            (@_11 = Core.getfield(%146, 2))
β”‚     %149 = Base.indexed_iterate(%145, 2, @_11::Core.Const(2))::Core.PartialStruct(Tuple{Any, Int64}, Any[Any, Core.Const(3)])
β”‚            (set = Core.getfield(%149, 1))
└────        goto #31
30 ──        (var = x)
└────        (set = JuMP.nothing)
31 ┄─ %154 = JuMP.isexpr(var, :vect)::Bool
└────        goto #33 if not %154
32 ──        (@_56 = %154)
└────        goto #37
33 ── %158 = JuMP.isexpr(var, :vcat)::Bool
└────        goto #35 if not %158
34 ──        (@_57 = %158)
└────        goto #36
35 ──        (@_57 = anon_singleton)
36 ┄─        (@_56 = @_57)
37 ┄─        (anonvar = @_56)
└────        goto #41 if not anonvar
38 ──        goto #41 if not explicit_comparison
39 ── %167 = (set === JuMP.nothing)::Bool
└────        goto #41 if not %167
40 ──        (_error)("Cannot use explicit bounds via >=, <= with an anonymous variable")
41 ┄─        (variable = JuMP.gensym())
β”‚     %171 = JuMP.Containers._get_name::Core.Const(JuMP.Containers._get_name)
β”‚            (name = (%171)(var))
β”‚     %173 = JuMP.isempty(base_name_kw_args)::Any
└────        goto #46 if not %173
42 ──        goto #44 if not anonvar
43 ──        (@_58 = "")
└────        goto #45
44 ──        (@_58 = JuMP.string(name))
45 ┄─        (base_name = @_58)
└────        goto #47
46 ── %181 = Base.getindex(base_name_kw_args, 1)::Any
β”‚     %182 = Base.getproperty(%181, :args)::Any
β”‚     %183 = Base.getindex(%182, 2)::Any
└────        (base_name = JuMP.esc(%183))
47 ┄─ %185 = (name isa JuMP.Symbol)::Bool
β”‚     %186 = !%185::Bool
└────        goto #50 if not %186
48 ── %188 = !anonvar::Bool
└────        goto #50 if not %188
49 ── %190 = Base.error::Core.Const(error)
β”‚     %191 = Base.string("Expression ", name, " should not be used as a variable name. Use the \"anonymous\" syntax ", name, " = @variable(model, ...) instead.")::String
└────        (%190)(%191)
50 ┄─ %193 = JuMP.isempty(set_kw_args)::Any
β”‚     %194 = !%193::Any
└────        goto #56 if not %194
51 ── %196 = JuMP.length(set_kw_args)::Any
β”‚     %197 = (%196 > 1)::Any
└────        goto #53 if not %197
52 ── %199 = JuMP.length(set_kw_args)::Any
β”‚     %200 = Base.string("`set` keyword argument was given ", %199, " times.")::String
└────        (_error)(%200)
53 ┄─ %202 = (set !== JuMP.nothing)::Bool
└────        goto #55 if not %202
54 ── %204 = Base.string("Cannot specify set twice, it was already set to `", set, "` so the `set` keyword argument is not allowed.")::String
└────        (_error)(%204)
55 ┄─ %206 = Base.getindex(set_kw_args, 1)::Any
β”‚     %207 = Base.getproperty(%206, :args)::Any
β”‚     %208 = Base.getindex(%207, 2)::Any
└────        (set = JuMP.esc(%208))
56 ┄─        (#104 = %new(JuMP.:(var"#104#113")))
β”‚     %211 = #104::Core.Const(JuMP.var"#104#113"())
β”‚     %212 = JuMP.any(%211, extra)::Any
└────        goto #60 if not %212
57 ── %214 = (set !== JuMP.nothing)::Bool
└────        goto #59 if not %214
58 ── %216 = Base.string("Cannot specify set twice, it was already set to `", set, "` so the `PSD` argument is not allowed.")::String
└────        (_error)(%216)
59 ┄─        (set = $(Expr(:copyast, :($(QuoteNode(:(JuMP.PSDCone())))))))
60 ┄─        (#105 = %new(JuMP.:(var"#105#114")))
β”‚     %220 = #105::Core.Const(JuMP.var"#105#114"())
β”‚     %221 = JuMP.any(%220, extra)::Any
└────        goto #64 if not %221
61 ── %223 = (set !== JuMP.nothing)::Bool
└────        goto #63 if not %223
62 ── %225 = Base.string("Cannot specify `Symmetric` when the set is already specified, the variable is constrained to belong to `", set, "`.")::String
└────        (_error)(%225)
63 ┄─        (set = $(Expr(:copyast, :($(QuoteNode(:(JuMP.SymMatrixSpace())))))))
64 ┄─        (#106 = %new(JuMP.:(var"#106#115")))
β”‚     %229 = #106::Core.Const(JuMP.var"#106#115"())
β”‚            (extra = JuMP.filter(%229, extra))
β”‚     %231 = extra::Any
β”‚            (@_7 = Base.iterate(%231))
β”‚     %233 = (@_7 === nothing)::Bool
β”‚     %234 = Base.not_int(%233)::Bool
└────        goto #71 if not %234
65 ┄─ %236 = @_7::Any
β”‚            (ex = Core.getfield(%236, 1))
β”‚     %238 = Core.getfield(%236, 2)::Any
β”‚     %239 = (ex == :Int)::Union{Missing, Bool}
└────        goto #67 if not %239
66 ──        JuMP._set_integer_or_error(_error, infoexpr)
└────        goto #69
67 ── %243 = (ex == :Bin)::Union{Missing, Bool}
└────        goto #69 if not %243
68 ──        JuMP._set_binary_or_error(_error, infoexpr)
69 ┄─        (@_7 = Base.iterate(%231, %238))
β”‚     %247 = (@_7 === nothing)::Bool
β”‚     %248 = Base.not_int(%247)::Bool
└────        goto #71 if not %248
70 ──        goto #65
71 ┄─        (#107 = %new(JuMP.:(var"#107#116")))
β”‚     %252 = #107::Core.Const(JuMP.var"#107#116"())
β”‚     %253 = JuMP.filter(%252, extra)::Any
β”‚     %254 = Base.broadcasted(JuMP.esc, %253)::Any
β”‚            (extra = Base.materialize(%254))
β”‚     %256 = JuMP.isempty(variable_type_kw_args)::Any
β”‚     %257 = !%256::Any
└────        goto #73 if not %257
72 ── %259 = extra::Any
β”‚     %260 = Base.getindex(variable_type_kw_args, 1)::Any
β”‚     %261 = Base.getproperty(%260, :args)::Any
β”‚     %262 = Base.getindex(%261, 2)::Any
β”‚     %263 = JuMP.esc(%262)::Expr
└────        JuMP.push!(%259, %263)
73 ┄─        (info = JuMP._constructor_expr(infoexpr))
β”‚     %266 = (var isa JuMP.Symbol)::Bool
└────        goto #75 if not %266
74 ──        (name_code = base_name)
└────        goto #88
75 ── %270 = (var isa JuMP.Expr)::Bool
└────        goto #77 if not %270
76 ──        goto #78
77 ── %273 = Base.string("Expected ", var, " to be a variable name")::String
└────        (_error)(%273)
78 ┄─ %275 = JuMP.Containers._build_ref_sets::Core.Const(JuMP.Containers._build_ref_sets)
β”‚     %276 = _error::JuMP.var"#_error#108"{LineNumberNode}
β”‚     %277 = (%275)(%276, var::Expr)::Tuple{Vector{Any}, Expr}
β”‚     %278 = Base.indexed_iterate(%277, 1)::Core.PartialStruct(Tuple{Vector{Any}, Int64}, Any[Vector{Any}, Core.Const(2)])
β”‚            (idxvars = Core.getfield(%278, 1))
β”‚            (@_5 = Core.getfield(%278, 2))
β”‚     %281 = Base.indexed_iterate(%277, 2, @_5::Core.Const(2))::Core.PartialStruct(Tuple{Expr, Int64}, Any[Expr, Core.Const(3)])
β”‚            (indices = Core.getfield(%281, 1))
β”‚     %283 = Core.isdefined(args@_53::Core.Box, :contents)::Bool
└────        goto #80 if not %283
79 ──        goto #81
80 ──        Core.NewvarNode(:(args@_51))
└────        args@_51
81 ┄─ %288 = Core.getfield(args@_53::Core.Box, :contents)::Any
β”‚     %289 = Base.getindex(%288, 1)::Any
β”‚     %290 = (%289 in idxvars)::Union{Missing, Bool}
└────        goto #86 if not %290
82 ── %292 = Core.isdefined(args@_53::Core.Box, :contents)::Bool
└────        goto #84 if not %292
83 ──        goto #85
84 ──        Core.NewvarNode(:(args@_52))
└────        args@_52
85 ┄─ %297 = Core.getfield(args@_53::Core.Box, :contents)::Any
β”‚     %298 = Base.getindex(%297, 1)::Any
β”‚     %299 = Base.string("Index ", %298, " is the same symbol as the model. Use a ")::String
β”‚     %300 = (%299 * "different name for the index.")::String
└────        (_error)(%300)
86 ┄─        (name_code = JuMP._name_call(base_name, idxvars))
β”‚     %303 = (set !== JuMP.nothing)::Bool
└────        goto #88 if not %303
87 ── %305 = JuMP.Containers.container_code::Core.Const(JuMP.Containers.container_code)
β”‚     %306 = idxvars::Vector{Any}
β”‚     %307 = indices::Expr
β”‚     %308 = name_code::Any
└────        (name_code = (%305)(%306, %307, %308, requestedcontainer))
88 ┄─ %310 = Core.tuple(:call, :build_variable, _error, info)::Core.PartialStruct(Tuple{Symbol, Symbol, JuMP.var"#_error#108"{LineNumberNode}, Expr}, Any[Core.Const(:call), Core.Const(:build_variable), JuMP.var"#_error#108"{LineNumberNode}, Expr])
β”‚            (buildcall = Core._apply_iterate(Base.iterate, Core._expr, %310, extra))
β”‚            JuMP._add_kw_args(buildcall, extra_kw_args)
β”‚     %313 = (set !== JuMP.nothing)::Bool
└────        goto #93 if not %313
89 ── %315 = (var::Union{Expr, Symbol} isa JuMP.Symbol)::Bool
└────        goto #91 if not %315
90 ──        (scalar_variables = buildcall)
└────        goto #92
91 ── %319 = JuMP.Containers.container_code::Core.Const(JuMP.Containers.container_code)
β”‚     %320 = idxvars::Vector{Any}
β”‚     %321 = indices::Expr
β”‚     %322 = buildcall::Expr
└────        (scalar_variables = (%319)(%320, %321, %322, requestedcontainer))
92 ┄─        (buildcall = Core._expr(:call, :build_variable, _error, scalar_variables, set))
93 ┄─        (variablecall = Core._expr(:call, :add_variable, model, buildcall, name_code))
β”‚     %326 = (var::Union{Expr, Symbol} isa JuMP.Symbol)::Bool
└────        goto #95 if not %326
94 ──        (@_59 = %326)
└────        goto #96
95 ──        (@_59 = set !== JuMP.nothing)
96 ┄─        goto #98 if not @_59
97 ──        (creation_code = variablecall)
└────        goto #99
98 ── %334 = JuMP.Containers.container_code::Core.Const(JuMP.Containers.container_code)
β”‚     %335 = idxvars::Vector{Any}
β”‚     %336 = indices::Expr
β”‚     %337 = variablecall::Expr
└────        (creation_code = (%334)(%335, %336, %337, requestedcontainer))
99 ┄─        goto #101 if not anonvar
100 ─        (macro_code = creation_code)
└────        goto #102
101 ─ %342 = (:model_for_registering,)::Core.Const((:model_for_registering,))
β”‚     %343 = Core.apply_type(Core.NamedTuple, %342)::Core.Const(NamedTuple{(:model_for_registering,), T} where T<:Tuple)
β”‚     %344 = Core.tuple(model)::Tuple{Expr}
β”‚     %345 = (%343)(%344)::NamedTuple{(:model_for_registering,), Tuple{Expr}}
β”‚     %346 = Core.kwfunc(JuMP._macro_assign_and_return)::Core.Const(JuMP.var"#_macro_assign_and_return##kw"())
β”‚     %347 = creation_code::Expr
β”‚     %348 = variable::Symbol
└────        (macro_code = (%346)(%345, JuMP._macro_assign_and_return, %347, %348, name))
102 β”„ %350 = JuMP._finalize_macro(model, macro_code, __source__)::Expr
└────        return %350

I don’t know much about how JuMP works, but there you have some function that returns an expression. I guess on model building things are probably frequently not typed. One should not have instabilities in the more number crunching routines.

Youre looking at the macro creation code, not the actual runtime code.

I found this issue that matches the description of my issue, but it looks like a fix was merged into master by following the issue links:

  1. data race in GC alloc counters Β· Issue #27173 Β· JuliaLang/julia Β· GitHub
  2. make GC counters thread-local by JeffBezanson Β· Pull Request #32217 Β· JuliaLang/julia Β· GitHub

However, my issue is that a subprocess called by Julia is multithreading, and I can get the memory leak even with JULIA_NUM_THREADS=1. The only β€œfix” so far has been to set Xpress THREADS=1, but this causes our API to be unbearably slow. I guess I should raise an issue on the Julia repo? Anyone have thoughts on this? Is there a way to make a MWE of calling a multithreading subprocess without JuMP+solver?

BTW I tried GC.safe_point() in the global scope and function scope with no changes to the memory growth (using the knapsack.jl in a loop).

1 Like

So if you set THREADS=1, all those plots with growing memory become very stable?
Maybe you can send them for reference here.

Or maybe is just the program that is so slow now that it has no time to reach the memory growth ramp?

Any chance you can plot with a minimum number of iteration?

See this chart above in my comment with the header β€œ3. JuMP+Xpress JULIA_NUM_THREADS=1 and Xpress THREADS=1”. Would it be informative to run more cases with THREADS=1?

I am using stress-ng and the Base.run command to recreate a single threaded Julia container calling a multithreaded subprocess. I will update here when I have some charts to share.

1 Like

Adding a StackOverflow post with the same issue (unanswered):
https://stackoverflow.com/questions/52380799/memory-doesnt-get-freed-in-multiple-threads

Here is another related post (unanswered):

It is interesting that all other examples have allocation done by Julia. Then it makes sense GC no freeing stuff.

What is weird about your problem is that the allocations are done by Xpress/CPLEX. Should they be freeing the memory?

There is no Julia code on threads, right? perhaps callbacks?

Have you already opened issues with FICO/IBM?

When I ran a roughly equivalent problem in Mosel (knapsack.mos above) I did not see any memory growth. We also did not have any memory growth issues when we used Python to call Mosel (via subprocess). It seems like the memory issue is related to threading and garbage collection in Julia, but I am not a computer scientist so this is just a hunch based off of our problem and the others seeing the same weird memory issue. I think that it is not really a β€œleak” per se because the memory use seems to plateau (see my first post). And others have noted the plateau behavior, as well as some improvement by adding GC.gc() (see links above).

I have been trying to find a MWE that does not use any Julia packages, like this one:

Repeated just now with Julia 1.6.1:

julia> versioninfo()
Julia Version 1.6.1
Commit 6aaedecc44 (2021-04-23 05:59 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin18.7.0)
  CPU: Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, skylake)
Environment:
  JULIA_NUM_THREADS = 4
  JULIA_HOST = 127.0.0.1

julia> function foo(N)
         Threads.@threads for i in 1:N
           sum(collect(1:10^6))
         end
       end
foo (generic function with 1 method)

julia> @time foo(10^5)
166.957135 seconds (308.26 k allocations: 745.072 GiB, 54.93% gc time, 0.01% compilation time)

I also tried running foo with GC.gc() after the sum, but I killed it after ~15 minutes. However, this MWE does not lead to memory growth. (But maybe the slow down with GC.gc() is another issue.)

Running

for _ in range(1, stop=100)
    run(`./stress.sh`)
    GC.gc()
end

where stress.sh is (an attempt to mimic a solver):

#!/bin/bash
stress-ng -t 30s --matrix 4 --mmap-bytes 6g --mmap 4 --copy-file 2 --copy-file-bytes 1g

Gives:
t30smatrix4mmapbytes6gmmap4copyfile2bytes1g
I have been trying other combinations of stress-ng settings but I have not found one particular operation that leads to consistent memory growth.

Returning now to knapsack.jl if I run

using Random, JuMP, Xpress
m = Model(()->Xpress.Optimizer(THREADS=5))
@time knapsack(m, 5000, 5000)

I get

61.578035 seconds (344.06 M allocations: 13.692 GiB, 3.08% gc time)

Is the 13.692 GiB problematic @odow ? I don’t know what to expect for allocations and how they might relate to memory consumption growth over iterations.

Calling GC.gc(), can lead to large slowdown indeed, that is even a warning in the docstring.

Since you tried python calling mosel, you could try julia calling mosel with run. What you think?

It would be nice to see this graph:

Including a version with THREADS=1. So that we can compare with xpress w/GC

It might be that THREADS=1 simply allocates less that free number of threads. Then the problem would be GC only.

I’m confused as to what the actual problem is here. As far as I understand it:

  • You want to solve multiple calls to reopt in a single (serial) Julia instance.
  • Each solve uses Xpress, which parallelizes over Xpress.THREADS in the branch-and-bound
  • Overtime, the memory allocated by this Julia instance (which includes the memory allocated by Xpress) increases before plateauing.
  • Sometimes, it hits the docker memory limit and kills the job.

This could be caused by

  • A memory leak of Julia objects
  • A memory leak in Xpress
  • Julia not aggressively freeing memory after a solve

A memory leak in Xpress is unlikely because you saw similar results with Xpress and CPLEX. That they grew at a similar rates suggests it is a Julia issue, but the fact that it plateaued suggests that it is not a memory leak in the sense that things are escaping the GC. So that leaves Julia not being aggressive.

The fact that https://github.com/jump-dev/Xpress.jl/issues/128 is a problem, suggests that there is some interrelated aspect of the Xpress finalizer that persists across models. That is a good place to start looking.

There are also things you could try:

  • For hard MIPs, you should expect significant memory allocations due to branch-and-bound. Solvers have a variety of options to set if there is a hard upper-limit. For example;
    MAXMEMORYSOFT
  • Restart your docker worker every N iterations
  • Ask FICO for support (Xpress.jl is maintained on a voluntary by the community.)
2 Likes