Hello to everyone,
I am trying to execute in parallel multiple optimizations with Gurobi inside a genetic algorithm written by myself.
The code is very long and structured in many parts, here I report the conceptual structure:
using Distributed # and other packages
addprocs(26)
@everywhere begin
using JuMP, Gurobi # and other packages
end
@everywhere begin
# load input data in all the processes
if !(@isdefined env)
const env = Gurobi.Env()
end
end
fitness_values = SharedArray{Float64}(26)
function par_OF(population)
@sync @distributed for i in 1:length(population)
fitness_values[i] = OF(population[i]) # in the OF function is executed the optimization with Gurobi
end
end
return fitness_values
end
population = [rand(15) for n in 1:26] # creation of initial population
conv_param = 1.
while conv_param > 1e-2
fitness_values = par_OF(population)
# update population according to fitness_values
# update conv_param according to fitness_values
end
When executing the code, I am encountering a strange error, which happens at different iterations according to the specific run.
From worker 20:
From worker 20: Please submit a bug report with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
From worker 20: Exception: EXCEPTION_ACCESS_VIOLATION at 0x7ffdc38bcd99 -- GRBrelaxmodel at C:\gurobi912\win64\bin\gurobi91.DLL (unknown line)
From worker 20: in expression starting at none:0
From worker 20: GRBrelaxmodel at C:\gurobi912\win64\bin\gurobi91.DLL (unknown line)
From worker 20: GRBrelaxmodel at C:\gurobi912\win64\bin\gurobi91.DLL (unknown line)
From worker 20: GRBfeasrelax at C:\Users\umbe\.julia\packages\Gurobi\FliRK\src\gen91\libgrb_api.jl:308
From worker 20: unknown function (ip: 000000000219eac5)
From worker 20: UL_OF at C:\Users\umbe\OneDrive\Script\UL\UL_OF.jl:69
From worker 20: OF at C:\Users\umbe\OneDrive\Script\UL\UL_opt.jl:64
From worker 20: macro expansion at C:\Users\umbe\OneDrive\Script\UL\UL_opt.jl:99 [inlined]
From worker 20: #133 at C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.7\Distributed\src\macros.jl:303
From worker 20: #178 at C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.7\Distributed\src\macros.jl:83
From worker 20: unknown function (ip: 000000000219ef23)
From worker 20: jl_apply at /cygdrive/c/buildbot/worker/package_win64/build/src\julia.h:1788 [inlined]
From worker 20: do_apply at /cygdrive/c/buildbot/worker/package_win64/build/src\builtins.c:713
From worker 20: #107 at C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.7\Distributed\src\process_messages.jl:274
From worker 20: run_work_thunk at C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.7\Distributed\src\process_messages.jl:63
From worker 20: unknown function (ip: 0000000032bfb526)
From worker 20: run_work_thunk at C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.7\Distributed\src\process_messages.jl:72
From worker 20: #100 at .\task.jl:429
From worker 20: unknown function (ip: 0000000032bfb0b3)
From worker 20: jl_apply at /cygdrive/c/buildbot/worker/package_win64/build/src\julia.h:1788 [inlined]
From worker 20: start_task at /cygdrive/c/buildbot/worker/package_win64/build/src\task.c:877
From worker 20: Allocations: 1417948818 (Pool: 1410181436; Big: 7767382); GC: 1053
Worker 20 terminated.
Unhandled Task ERROR: EOFError: read end of file
Stacktrace:
[1] (::Base.var"#wait_locked#648")(s::Sockets.TCPSocket, buf::IOBuffer, nb::Int64)
@ Base .\stream.jl:892
[2] unsafe_read(s::Sockets.TCPSocket, p::Ptr{UInt8}, nb::UInt64)
@ Base .\stream.jl:900
[3] unsafe_read
@ .\io.jl:724 [inlined]
[4] unsafe_read(s::Sockets.TCPSocket, p::Base.RefValue{NTuple{4, Int64}}, n::Int64)
@ Base .\io.jl:723
[5] read!
@ .\io.jl:725 [inlined]
[6] deserialize_hdr_raw
@ C:\Users\umbe\AppData\Local\Programs\Julia-1.7.3\share\julia\stdlib\v1.7\Distributed\src\messages.jl:167 [inlined]
[7] message_handler_loop(r_stream::Sockets.TCPSocket, w_stream::Sockets.TCPSocket, incoming::Bool)
@ Distributed C:\Users\umbe\AppData\Local\Programs\Julia-1.7.3\share\julia\stdlib\v1.7\Distributed\src\process_messages.jl:165
[8] process_tcp_streams(r_stream::Sockets.TCPSocket, w_stream::Sockets.TCPSocket, incoming::Bool)
@ Distributed C:\Users\umbe\AppData\Local\Programs\Julia-1.7.3\share\julia\stdlib\v1.7\Distributed\src\process_messages.jl:126
[9] (::Distributed.var"#103#104"{Sockets.TCPSocket, Sockets.TCPSocket, Bool})()
@ Distributed .\task.jl:429
ERROR: LoadError: Unhandled Task ERROR: ProcessExitedException(20)
Stacktrace:
[1] sync_end(c::Channel{Any})
@ Base .\task.jl:381
[2] (::Distributed.var"#177#179"{var"#133#135"{Vector{Any}, SharedVector{Float64}}, UnitRange{Int64}})()
@ Distributed .\task.jl:400
TaskFailedException
nested task error: ProcessExitedException(20)
Stacktrace:
[1] sync_end(c::Channel{Any})
@ Base .\task.jl:381
[2] (::Distributed.var"#177#179"{var"#133#135"{Vector{Any}, SharedVector{Float64}}, UnitRange{Int64}})()
@ Distributed .\task.jl:400
Stacktrace:
[1] sync_end(c::Channel{Any})
@ Base .\task.jl:381
[2] macro expansion
@ .\task.jl:400 [inlined]
[3] par_OF(population::Vector{Any}, num_elite::Int64, elite_fitness::Vector{Float64}, fitness_values::SharedVector{Float64})
@ Main C:\Users\umbe\OneDrive\Script\UL\UL_opt.jl:98
[4] select_parents(population::Vector{Any}, num_parents::Int64, num_elite::Int64, elite_fitness::Vector{Float64}, fitness_values::SharedVector{Float64})
@ Main C:\Users\umbe\OneDrive\Script\UL\UL_opt.jl:120
[5] genetic_algorithm(population_size::Int64, num_generations::Int64, num_parents::Int64, mutation_rate::Float64, num_elite::Int64, lb::Vector{Float64}, ub::Vector{Float64}, lc::Vector{Float64}, uc::Vector{Float64}, init_population::Vector{Vector{Float64}}, fitness_values::SharedVector{Float64})
@ Main C:\Users\umbe\OneDrive\Script\UL\UL_opt.jl:187
[6] top-level scope
@ C:\Users\umbe\OneDrive\Script\UL\UL_opt.jl:222
[7] include(fname::String)
@ Base.MainInclude .\client.jl:451
[8] top-level scope
@ C:\Users\umbe\OneDrive\Script\Main.jl:40
[9] eval
@ .\boot.jl:373 [inlined]
[10] include_string(mapexpr::typeof(identity), mod::Module, code::String, filename::String)
@ Base .\loading.jl:1196
in expression starting at C:\Users\umbe\OneDrive\Script\UL\UL_opt.jl:222
in expression starting at C:\Users\umbe\OneDrive\Script\Main.jl:40
I am having some issues trying to understand what went wrong…it may be something related to the execution of the relaxed version of the problem, which is addressed in my code when the original problem results to be unfeasible, but I do not know how to fix it.
Here instead I found that the error Unhandled Task ERROR: EOFError: read end of file
is the error you get when one of the parallelized workers hits an error
but again, since I do not know the reasons that cause it I do not know what I should do.
Can anyone help me, please?