Making a collection of JuMP models using pmap

I’m trying to create a number of JuMP models in parallel using pmap, and while the basic functionality of this is working fine, my implementation stores some variables and expressions in the ext field of the JuMP.Model. This works fine if building the models sequentially (single-threaded) using map(), or a for loop, but I’ve run into a problem when I use pmap().

Here’s a very simple example that demonstrates the issue:

using Distributed

if length(workers())==1
   addprocs(1)
end
@everywhere using JuMP

@everywhere function model_builder(rhs)    
    m = JuMP.Model()
    @variable(m,x>=0)
    @objective(m,Min,x)
    @constraint(m,x>=rhs)

    m.ext[:variable]=x
    m.ext[:expr]=2*x

    m
end
# Single-threaded models
models=map(model_builder,1:10)
models[1][:x] in keys(models[1].ext[:expr].terms)
collect(keys(models[1].ext[:expr].terms))[1]==models[1][:x]
models[1].ext[:expr]+models[1].ext[:variable]
 
# Multi-threaded models
models=pmap(model_builder,1:10)
models[1][:x] in keys(models[1].ext[:expr].terms)
collect(keys(models[1].ext[:expr].terms))[1]==models[1][:x]
models[1].ext[:expr]+models[1].ext[:variable]

I’ve also attached an image showing the output from each line. As you can see everything is normal when using map. However, for pmap Julia thinks that the x in the expression and the x variable are different (sort of). When I add the expression 2x to the variable x, instead of getting 3x, I get 2x+x.

I’ve found a workaround to fix the expression, after pmap returns the models, but this only works single-threaded.

temp=model[1].ext[:expr]
temp_terms=collect(keys(temp.terms))
temp_values=collect(values(temp.terms))
model[1].ext[:expr]=AffExpr(temp.constant)
for i in 1:length(temp_terms)
    model[1].ext[:expr]+=temp_values[i]*temp_terms[i]
end

The fact that this fix works confuses me, since it is just taking the variables and coefficients from an AffExpr in order to create a new (identical?) AffExpr. Perhaps, there’s something about how the memory is being managed between the threads that I don’t understand, but I thought that pmap would handle all of that.

This appears to be a bug in OrderedCollections.

@everywhere begin
    using OrderedCollections
    struct Foo
        x::Vector{Int}
    end
    function foo(i)    
        x = Foo(Int[])
        return x, OrderedDict(x => 2)
    end
end

function bar()
    x, y = pmap(foo, 1:1)[1]
    z = copy(y)
    OrderedCollections.rehash!(z)
    return haskey(y, x), haskey(z, x)
end
a, b = bar()  # false, true ?!?

Issue: https://github.com/JuliaCollections/OrderedCollections.jl/issues/9

As a work-around for your issue, you need to call rehash! on models[1].ext[:expr].terms

Thanks Oscar.

Obviously I’m doing something that no one else has ever done. I must be a pioneer, or just very lost.

I’m not sure if this worked on a previous release, or whether it has always been broken. Typically we construct models on a process and keep them there to avoid the communication overhead. Why construct models and then return them to the main process?

The models are a collection of MIPs (perhaps 100s of them). Since most solvers support multi-threading for MIPs, I’m not sure that there will be much benefit to solving them in parallel (plus, I don’t know enough Julia to do anything other than pmap).

I have one example that was taking 10+ minutes to just formulate the 341 models, and I’ve got that down to 80 seconds by using pmap, as described in this post.

If you have a link to an example of how I could allocate models to processes, and keep them there, but control them (i.e. change objective coefficients, run optimize, and get solutions) from the master process. I guess that would be the ideal structure.