Performance of Computation of Expectation of slow function

Hi! I am trying to optimize this section of code. Basically, there are two functions. The first one solves a linear programming problem over a convex set. The second function computes a Monte Carlo experiment over S trials of the first function. I can see how the second function can be easy to parallelize, however, this tidbit is part of a larger loop which I am parallelizing. Any help would be very much appreciated!!

using JuMP, LinearAlgebra, Random, Distributions, BenchmarkTools, MosekTools

const Ι = 250
const J = 50
const σ = 5
const α = 1.0
const S= 500

#Create a vector of productivities

const z=exp.(rand(Normal(0,1),Ι,S))

const β=exp.(rand(Normal(0,1),J))./10

C = ones(Ι).*10


# Create a Matrix of Distances

function distmat(J,Ι)
    Distances = zeros(J,Ι)
    Random.seed!(7777)
    coordinates_market = 100*rand(J,2)
    coordinates_plant = 100*rand(Ι,2)
    for j = 1:J, l=1:Ι
        Distances[j,l] = sqrt((coordinates_market[j,1]-coordinates_plant[l,1])^2+(coordinates_market[j,2]-coordinates_plant[l,2])^2)
    end
    return 1 .+ Distances./100
end

const τ = distmat(J,Ι)

function f1(C,β,τ,z)
    (J,Ι) = size(τ)
    opt = Model(optimizer_with_attributes(Mosek.Optimizer, "QUIET" => false, "INTPNT_CO_TOL_DFEAS" => 1e-7))
    set_silent(opt)
    a = ((σ-1)/σ)^(σ-1)/(σ).*β.^(σ)
    @variable(opt, x[1:Ι] >= 0)
    @variable(opt, y[1:J]>= 0)
    @variable(opt, t[1:J]>= 0)
    @objective(opt, Min, x'*C .+ a'*y)

    for i=1:Ι, j=1:J
		@constraint(opt, τ[j,i]/z[i] + τ[j,i]*x[i] - t[j] >= 0)
	end   

    for j = 1 : J
        @constraint(opt, [y[j],t[j],1] in MOI.PowerCone(1/σ))
    end    

    optimize!(opt)

    return objective_value(opt), value.(x), value.(t)

end 

function f2(Cap,β,τ,z)
    (J,Ι) = size(τ)
    Simul = size(z)[2]
    Eπ = 0.0 
    Eν = zeros(Ι)
    for s = 1 : Simul
        temp = f1(Cap,β,τ,z[:,s])./Simul
        Eπ += temp[1]
        Eν += temp[2]
    end  
    func = Eπ .- α*sum(Cap) 
    foc = Eν .- α
    
    return func, foc

end    

Test_Vec = 100 .*rand(Ι)

@btime f1(Test_Vec,β,τ,z)

@btime f2(Test_Vec,β,τ,z)


Am I right when I say that all the LPs share the same feasible set, but have different objective functions? In that case, you could use as initial point of the LP a point that you know is feasible (it may be costly for the solver to determine a feasible point).
I would compute one LP at the beginning, figure out a feasible point (for example the solution of the LP), then solve all the other LPs in parallel starting from this feasible point.

Hi @cvanaret, actually what varies across simulations s is not the objective function, but the first set of constraints, i.e the z[i] in the expression below.

for i=1:Ι, j=1:J
		@constraint(opt, τ[j,i]/z[i] + τ[j,i]*x[i] - t[j] >= 0)
end   

So I think I cannot apply your suggestion, unless I pass the solution for the minimum of z in each dimension, which might be worth trying.

Ok sorry for my mistake. In this case, I would apply my suggestion to the dual problem, a (usually equivalent) LP where the variables and constraints are exchanged. That way, the feasible set is fixed and you can exploit warmstart (starting from a known initial point).

Another thing that I see is that this loop is responsible for about 40% of the computation time. Is there any way to speed this up?

for i=1:Ι, j=1:J
		@constraint(opt, τ[j,i]/z[i] + τ[j,i]*x[i] - t[j] >= 0)
end   

any ideas @odow

I haven’t tested this, but:

Add a new variable:

@variable(model, z_inv[1:I])
for i=1:Ι, j=1:J
    @constraint(opt, τ[j,i] * z_inv[i] + τ[j,i]*x[i] - t[j] >= 0)
end

And then instead of constructing a new model every time, just fix the bounds of z_inv:

for i in 1:I
    fix(z_inv[i], 1 / z[i]; force = true)
end
1 Like

@odow thanks so much! It takes about the same time. If you have any other idea it would really appreciate it.

Are there any other suggestions on improving the performance of this loop of constraints?

for i=1:Ι, j=1:J
		@constraint(opt, τ[j,i]/z[i] + τ[j,i]*x[i] - t[j] >= 0)
end  

I think what odow suggested was that you create one JuMP model that you re-use in the inner loop that’s in your f2. By doing this, you avoid re-creating the model and that quadratic-long loop that’s taking a big part of your time.

Things would look something like

function f2(...)
    ...  # prepare your data (no changes)
    
    # Build the model only once!
    model = build_conic_model(...)  # <-- this would use odow's trick
    z_inv = model[:z_inv]
    
    # /!\ to execute this loop in parallel, you'd need as many models as you have threads
    for s in 1:Simul
        # update the model by updating the value of 1/z[i]
        for i in 1:I
            fix(z_inv[i], 1 / z[i]; force = true)
        end
        optimize!(model)
        ...  # record objective value and solution
    end
    ...  # post-simulation stuff

    return ...
end

In the snippet above, the build_conic_model function would do almost the same as f1, except you only build the model, i.e., you don’t call optimize! yet.

1 Like

Thanks so much @mtanneau ! I’ll try this. Just a quick follow up, to parallelize that inner loop. why would I need as many models as threads? Would it work if Simul > the number of threads?

The original code your wrote creates a new problem from scratch, solves it, and return the solution.

Odow’s trick consists in keeping a single JuMP model in memory throughout the simulation loop (where you iterate over s). You cannot just @threads that anymore, because every thread would be mutating the same JuMP object.

Therefore, if you want to have, say 8 threads solving 8 problems in parallel (one per thread), you need 8 different JuMP objects (also one per thread).

I see. Is there any way out?

There’s no fundamental issue here: if you want to solve X problems in parallel, you need to have X problems (and as many JuMP models) in the first place. Note that using X threads will increase your memory requirements by a factor of about X.

This is not specific to JuMP, it’s a common thing for multi-threaded applications. I believe the coined term is race condition: you cannot have two different threads execute two different mutating operations on the same object at the same time.

1 Like