Parallel processing

I did some experiments with parallel processing, i.e. solving models in parallel. But the result is not what I expected:

using Distributed
import JuMP


function simple_model(a, b)
    model = JuMP.Model()

    JuMP.@variable(model, 0 <= x <= 2)
    JuMP.@variable(model, 0 <= y <= 30)

    JuMP.@objective(model, Max, a*x + b*y)
    JuMP.@constraint(model, 1x + 5y <= 3.0)
    return model

@everywhere import Clp, JuMP
@everywhere function solve_model(model)
    optimizer = Clp.Optimizer
    optimizer_params = Dict("LogLevel" => 0)
    JuMP.set_optimizer.(model, optimizer)
    JuMP.set_optimizer_attributes.(model, optimizer_params...)

models = [simple_model(rand(), rand()) for i in 1:1000];

t0 = time()
for i in 1:10
    map(solve_model, models)
t1 = time()
println("Serial execution: elapsed time: ", (t1-t0)/10, " seconds")

t0 = time()
for i in 1:10
    pmap(solve_model, models, batch_size=1)
t1 = time()
println("Parallel execution: elapsed time: ", (t1-t0)/10 " seconds")

Serial execution: elapsed time: 1.6002999782562255 seconds
Parallel execution: elapsed time: 3.769099998474121 seconds

If I increase the batch_size to 10 I get better results:

Serial execution: elapsed time: 1.470799994468689 seconds
Parallel execution: elapsed time: 1.8813000202178956 seconds

But still, the parallel processing approach is slower.

Am I doing something wrong?

What are the specifications of the server or laptop you are running this on?
How many cores? Adn is hyperthreading enabled?

4 cores, Core i5, 8GB RAM, yes it is enabled.

A few things.

  • Parallel processing has an overhead. So you will probably only see a speed up on bigger models/number of models
  • One issue is that you’re having to copy the model to each proc. Try just passing a and b, and building and solving each model on the remote procs.
  • This won’t help with Clp for some internal reasons, but if you use a solver like GLPK or CPLEX, you can build the model once on each worker, and then just call @objective and solve for each new a and b. Then you’ll only pay the build cost once.

This was just an example. Only changing the objective is not enough in my application.

Okay, then that simplifies things, minimize the data-movement, and pass the data necessary to build the model, not the built JuMP model.

data = [(1, 1), (2, 3)]
@everywhere function build_and_solve(data)
    return data[1] + data[2]

pmap(build_and_solve, data)

Also remember with hyperthreading you will have only 2 real cores.
When you have followed @odow excellent advice then I would run with one proc, two procs and 4 procs

FWIW, I have had good success using Distributed / pmap and the approach that @odow suggests for setting up and solving the models independently in each worker.

In my case the JuMP model solves were on the order of 10-30 seconds, which was sufficient to outweigh the parallelization overheads.