I have an MINLP problem that I model and solve in parallel in JuMP using Juniper with Cbc and IPOPT as mip and nlp solvers, respectively. I cannot provide an MWE, but my code works fine with shorter simulations (this might be a hint of where the issue lies). However, I can give an overall description of the scripts:
- Script 1: optFunc.jl
using JuMP, Juniper, Ipopt, Cbc, NLopt, DataInterpolations, MAT
function optFunc(input1, ..., inputn)
# some code using JuMP
@variable
@constrainnt
@objective
return model
end
- Script 2: mySystem.jl
function mySystem(inputSys1, ..., inputSysn)
# simple julia code that simulates my system
return myOutput
end
- Main script: main.jl
@everywhere begin
using JuMP, Plots # and other packages
include("optFunc.jl")
include("mySystem.jl")
# Define a bunch of variables for later
# i = 1:myLimit
end
# Note: this is already outside everywhere
for i = 1:myLimit
# Optimize
model = optFunc(input1, ..., inputn)
JuMP.optimize!(model)
# Store results
myOptSol = value.(model[:myOptVariable])
# Simulate my system
mySimSol = mySystem(myOptSol)
# Set mySimSol as the initial conditions for the next optimizations
input1, ..., inputn = mySimSol
end
In my optmization problem, I have 756 variables, of which 180 are binary. The cost and memory of the simulation is negligible compared with the optimization. myLimit = 150
in this test, but it can be much larger potentially.
To run main.jl in parallel I do the following in Julia’s REPL:
1 - cd("mypath")
2 - using Distributed
3 - addprocs(10)
4 - @everywhere using Pkg
5 - @everywhere Pkg.activate("myEnv")
6 - include("main.jl")
When I run this with loose constraints it works fine. However, when I make the constraints tighter and the optimization becomes challenging, some of the workers stop and I get the following:
From worker 3:
From worker 3: Please submit a bug report with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
From worker 3: Exception: EXCEPTION_ACCESS_VIOLATION at 0x6f137880 -- mumps_cst_amf_ at C:\Users\user\.julia\artifacts\0316fdc27ab249eccb1f8e1c2fc3e111e8477070\bin\libmumps_common.dll (unknown line)
From worker 3: in expression starting at none:0
From worker 3: mumps_cst_amf_ at C:\Users\user\.julia\artifacts\0316fdc27ab249eccb1f8e1c2fc3e111e8477070\bin\libmumps_common.dll (unknown line)
From worker 3: __dmumps_ana_aux_m_MOD_dmumps_ana_f at C:\Users\user\.julia\artifacts\0316fdc27ab249eccb1f8e1c2fc3e111e8477070\bin\libdmumps.dll (unknown line)
From worker 3: dmumps_ana_driver_ at C:\Users\user\.julia\artifacts\0316fdc27ab249eccb1f8e1c2fc3e111e8477070\bin\libdmumps.dll (unknown line)
From worker 3: dmumps_ at C:\Users\user\.julia\artifacts\0316fdc27ab249eccb1f8e1c2fc3e111e8477070\bin\libdmumps.dll (unknown line)
From worker 3: .text at C:\Users\user\.julia\artifacts\0316fdc27ab249eccb1f8e1c2fc3e111e8477070\bin\libdmumps.dll (unknown line)
From worker 3: dmumps_c at C:\Users\user\.julia\artifacts\0316fdc27ab249eccb1f8e1c2fc3e111e8477070\bin\libdmumps.dll (unknown line)
From worker 3: _ZN5Ipopt20MumpsSolverInterface21SymbolicFactorizationEv at C:\Users\user\.julia\artifacts\6b0cdbf534d67d502bb3bdbbbcc79a89bcf10f7f\bin\libipopt-3.dll (unknown line)
From worker 3: _ZN5Ipopt20MumpsSolverInterface10MultiSolveEbPKiS2_iPdbi at C:\Users\user\.julia\artifacts\6b0cdbf534d67d502bb3bdbbbcc79a89bcf10f7f\bin\libipopt-3.dll (unknown line)
From worker 3: _ZN5Ipopt16TSymLinearSolver10MultiSolveERKNS_9SymMatrixERSt6vectorINS_8SmartPtrIKNS_6VectorEEESaIS8_EERS4_INS5_IS6_EESaISC_EEbi at C:\Users\user\.julia\artifacts\6b0cdbf534d67d502bb3bdbbbcc79a89bcf10f7f\bin\libipopt-3.dll (unknown line)
From worker 3: _ZN5Ipopt18StdAugSystemSolver10MultiSolveEPKNS_9SymMatrixEdPKNS_6VectorEdS6_dPKNS_6MatrixES6_dS9_S6_dRSt6vectorINS_8SmartPtrIS5_EESaISC_EESF_SF_SF_RSA_INSB_IS4_EESaISG_EESJ_SJ_SJ_bi at C:\Users\user\.julia\artifacts\6b0cdbf534d67d502bb3bdbbbcc79a89bcf10f7f\bin\libipopt-3.dll (unknown line)
From worker 3: _ZN5Ipopt15AugSystemSolver5SolveEPKNS_9SymMatrixEdPKNS_6VectorEdS6_dPKNS_6MatrixES6_dS9_S6_dRS5_SA_SA_SA_RS4_SB_SB_SB_bi at C:\Users\user\.julia\artifacts\6b0cdbf534d67d502bb3bdbbbcc79a89bcf10f7f\bin\libipopt-3.dll (unknown line)
From worker 3: _ZN5Ipopt22LeastSquareMultipliers20CalculateMultipliersERNS_6VectorES2_ at C:\Users\user\.julia\artifacts\6b0cdbf534d67d502bb3bdbbbcc79a89bcf10f7f\bin\libipopt-3.dll (unknown line)
From worker 3: _ZN5Ipopt25DefaultIterateInitializer18least_square_multsERKNS_10JournalistERNS_8IpoptNLPERNS_9IpoptDataERNS_25IpoptCalculatedQuantitiesERKNS_8SmartPtrINS_22EqMultiplierCalculatorEEEd at C:\Users\user\.julia\artifacts\6b0cdbf534d67d502bb3bdbbbcc79a89bcf10f7f\bin\libipopt-3.dll (unknown line)
From worker 3: _ZN5Ipopt25DefaultIterateInitializer18SetInitialIteratesEv at C:\Users\user\.julia\artifacts\6b0cdbf534d67d502bb3bdbbbcc79a89bcf10f7f\bin\libipopt-3.dll (unknown line)
From worker 3: _ZN5Ipopt14IpoptAlgorithm18InitializeIteratesEv at C:\Users\user\.julia\artifacts\6b0cdbf534d67d502bb3bdbbbcc79a89bcf10f7f\bin\libipopt-3.dll (unknown line)
From worker 3: _ZN5Ipopt14IpoptAlgorithm8OptimizeEb at C:\Users\user\.julia\artifacts\6b0cdbf534d67d502bb3bdbbbcc79a89bcf10f7f\bin\libipopt-3.dll (unknown line)
From worker 3: _ZN5Ipopt16IpoptApplication13call_optimizeEv at C:\Users\user\.julia\artifacts\6b0cdbf534d67d502bb3bdbbbcc79a89bcf10f7f\bin\libipopt-3.dll (unknown line)
From worker 3: _ZN5Ipopt16IpoptApplication11OptimizeNLPERKNS_8SmartPtrINS_3NLPEEERNS1_INS_16AlgorithmBuilderEEE at C:\Users\user\.julia\artifacts\6b0cdbf534d67d502bb3bdbbbcc79a89bcf10f7f\bin\libipopt-3.dll (unknown line)
From worker 3: _ZN5Ipopt16IpoptApplication11OptimizeNLPERKNS_8SmartPtrINS_3NLPEEE at C:\Users\user\.julia\artifacts\6b0cdbf534d67d502bb3bdbbbcc79a89bcf10f7f\bin\libipopt-3.dll (unknown line)
From worker 3: _ZN5Ipopt16IpoptApplication12OptimizeTNLPERKNS_8SmartPtrINS_4TNLPEEE at C:\Users\user\.julia\artifacts\6b0cdbf534d67d502bb3bdbbbcc79a89bcf10f7f\bin\libipopt-3.dll (unknown line)
From worker 3: IpoptSolve at C:\Users\user\.julia\artifacts\6b0cdbf534d67d502bb3bdbbbcc79a89bcf10f7f\bin\libipopt-3.dll (unknown line)
From worker 3: solveProblem at C:\Users\user\.julia\packages\Ipopt\vtrOr\src\Ipopt.jl:532
From worker 3: optimize! at C:\Users\user\.julia\packages\Ipopt\vtrOr\src\MOI_wrapper.jl:1713
From worker 3: optimize! at C:\Users\user\.julia\packages\MathOptInterface\YDdD3\src\Bridges\bridge_optimizer.jl:319
From worker 3: #process_node!#84 at C:\Users\user\.julia\packages\Juniper\8wso7\src\BnBTree.jl:69
From worker 3: process_node! at C:\Users\user\.julia\packages\Juniper\8wso7\src\BnBTree.jl:56 [inlined]
From worker 3: #branch!#85 at C:\Users\user\.julia\packages\Juniper\8wso7\src\BnBTree.jl:161
From worker 3: branch! at C:\Users\user\.julia\packages\Juniper\8wso7\src\BnBTree.jl:117
From worker 3: unknown function (ip: 0000000006c870b8)
From worker 3: one_branch_step! at C:\Users\user\.julia\packages\Juniper\8wso7\src\BnBTree.jl:320
From worker 3: unknown function (ip: 0000000006c6da63)
From worker 3: jl_apply at /cygdrive/c/buildbot/worker/package_win64/build/src\julia.h:1691 [inlined]
From worker 3: do_apply at /cygdrive/c/buildbot/worker/package_win64/build/src\builtins.c:674
From worker 3: #106 at C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Distributed\src\process_messages.jl:294
From worker 3: run_work_thunk at C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Distributed\src\process_messages.jl:79
From worker 3: macro expansion at C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Distributed\src\process_messages.jl:294 [inlined]
From worker 3: #105 at .\task.jl:356
From worker 3: unknown function (ip: 00000000533ccfa3)
From worker 3: jl_apply at /cygdrive/c/buildbot/worker/package_win64/build/src\julia.h:1691 [inlined]
From worker 3: start_task at /cygdrive/c/buildbot/worker/package_win64/build/src\task.c:721
From worker 3: Allocations: 218368401 (Pool: 218233746; Big: 134655); GC: 151
After this happens in several workers, I get the error:
ERROR: LoadError: TaskFailedException:
ProcessExitedException(3)
Stacktrace:
[1] try_yieldto(::typeof(Base.ensure_rescheduled)) at .\task.jl:656
[2] wait at .\task.jl:713 [inlined]
[3] wait(::Base.GenericCondition{ReentrantLock}) at .\condition.jl:106
[4] take_buffered(::Channel{Any}) at .\channels.jl:387
[5] take!(::Channel{Any}) at .\channels.jl:381
[6] take!(::Distributed.RemoteValue) at C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Distributed\src\remotecall.jl:599
[7] remotecall_fetch(::Function, ::Distributed.Worker, ::Nothing, ::Vararg{Any,N} where N; kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Distributed\src\remotecall.jl:390
[8] remotecall_fetch(::Function, ::Distributed.Worker, ::Nothing, ::Vararg{Any,N} where N) at C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Distributed\src\remotecall.jl:386
[9] remotecall_fetch(::Function, ::Int64, ::Nothing, ::Vararg{Any,N} where N; kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Distributed\src\remotecall.jl:421
[10] remotecall_fetch(::Function, ::Int64, ::Nothing, ::Vararg{Any,N} where N) at C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Distributed\src\remotecall.jl:421
[11] macro expansion at C:\Users\user\.julia\packages\Juniper\8wso7\src\BnBTree.jl:515 [inlined]
[12] (::Juniper.var"#96#100"{typeof(Juniper.one_branch_step!),Juniper.BnBTreeObj,Float64,Array{Symbol,1},Array{Int64,1},Juniper.TimeObj,Array{Symbol,1},Int64})() at .\task.jl:356
...and 1 more exception(s).
Stacktrace:
[1] sync_end(::Channel{Any}) at .\task.jl:314
[2] macro expansion at .\task.jl:333 [inlined]
[3] pmap(::Function, ::Juniper.BnBTreeObj, ::Array{Any,1}, ::Float64, ::Array{Symbol,1}, ::Array{Int64,1}, ::Juniper.TimeObj) at C:\Users\user\.julia\packages\Juniper\8wso7\src\BnBTree.jl:482
[4] solvemip(::Juniper.BnBTreeObj) at C:\Users\user\.julia\packages\Juniper\8wso7\src\BnBTree.jl:601
[5] optimize!(::Juniper.Optimizer) at C:\Users\user\.julia\packages\Juniper\8wso7\src\MOI_wrapper\MOI_wrapper.jl:297 [6] optimize!(::MathOptInterface.Bridges.LazyBridgeOptimizer{Juniper.Optimizer}) at C:\Users\user\.julia\packages\MathOptInterface\YDdD3\src\Bridges\bridge_optimizer.jl:319
[7] optimize!(::MathOptInterface.Utilities.CachingOptimizer{MathOptInterface.AbstractOptimizer,MathOptInterface.Utilities.UniversalFallback{MathOptInterface.Utilities.GenericModel{Float64,MathOptInterface.Utilities.ModelFunctionConstraints{Float64}}}}) at C:\Users\user\.julia\packages\MathOptInterface\YDdD3\src\Utilities\cachingoptimizer.jl:252
[8] optimize!(::Model, ::Nothing; bridge_constraints::Bool, ignore_optimize_hook::Bool, kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at C:\Users\user\.julia\packages\JuMP\Xrr7O\src\optimizer_interface.jl:185
[9] optimize! at C:\Users\user\.julia\packages\JuMP\Xrr7O\src\optimizer_interface.jl:157 [inlined] (repeats 2 times) [10] macro expansion at .\timing.jl:174 [inlined]
[11] top-level scope at $path\main.jl:72
[12] include(::String) at .\client.jl:457
[13] top-level scope at REPL[6]:1
in expression starting at $path\main.jl:58
These errors seem to happen faster when I have more workers added, which seems reasonable if it is a memory issue. Anyway, I cannot make sense of these errors, especially since the code runs perfectly in other, simpler cases. Does this happen because I am running out of memory? Is there a way to release it within the for loop so I can have larger simulations?