Help with Performance in Large Mixed Complementarity Problem

Hi!

I have a complementarity problem in which I am using the new awesome additions to JuMP to solve it. The problem involves a summation over 2^J terms, where J is around 30 and performance is critical.

I was wondering if I could get any recommendations on how to improve the performance of the code either on the proper JuMP function and on how to handle such a large problem.

Find a MWE below:

using JuMP, Combinatorics, BitOperations, BenchmarkTools
import PATHSolver

const J = 10
const param1 = 0.5
const param2 = 0.1
const prob = 0.5.*ones(J)

function possible_shock_combinations(J)
    ShockSet = Array{Union{Nothing,Int}}(nothing, 2^J, J)
    let int_s = fill(false, J) #pre-allocate output
        for j = 0:2^J-1
            @. int_s = bget(j, 0:J-1)
            ShockSet[j+1, :] = int_s
        end
    end
    return convert(Matrix{Int},ShockSet)
end

function joint_probability(p::Vector{Float64}, outcomes::Vector{Int})
    # Check if the lengths of p and outcomes are the same
    if length(p) != length(outcomes)
        println("The lengths of probabilities and outcomes must be the same.")
    end
    
    prob = 1.0
    for i in 1:length(outcomes)
        if outcomes[i] == 1
            prob *= p[i]
        else
            prob *= (1 - p[i])
        end
    end
    return prob
end

function shockprobs(sequences,probs)
    (KK, K) = size(sequences)
    proba = zeros(Float64, KK)
    for i = 1:KK
        proba[i] = joint_probability(probs, sequences[i, :])
    end
    return proba
end

const shocks = possible_shock_combinations(J)
probs = shockprobs(shocks,prob) #probability of each event#


function Comp(param1, param2, shock, probs, w, i)

    model = Model(PATHSolver.Optimizer)
    set_silent(model)
    @variable(model, x[1:J] >= 0, start = 0.5)
    @constraints(
        model,
        begin          
            c[j in 1:J], (-(1-param1)*w[i]*sum(probs[s]*(1-shock[s,j]*(1-param2))*sum(x[l]*(1 -shock[s,l]*(1-param2)) for l in 1: J)^(-1) for s in 1:2^J) + w[j])⟂ x[j] 
        end)
    JuMP.optimize!(model)
    return value.(x)
end

W = ones(J)

test = @btime Comp(param1, param2, shocks, probs, W, 1)
1 Like

Hi @jmc, welcome to the forum.

I’ll take a look at your model, but as an initial comment: 2^30 is ~1 billion. Does it make sense to model all of those possible shocks? The problem is slow to build because there will be a lot of terms in your expressions.

Hi @odow, thanks for the welcoming and for all your great work!

Yes, sorry J=30 might be a stretch. Let’s say J = 25 or J = 20. Regardless of the value of J, any potential way to improve the performance is very welcomed!!

1 Like

With a bit of re-arranging, I’ve simplified your problem down to:

using JuMP
import BitOperations
import PATHSolver

function possible_shock_combinations(J)
    return [BitOperations.bget(j, i) for j in 0:(2^J-1), i in 0:(J-1)]
end

function shock_probabilities(sequences, probs)
    @assert size(sequences, 2) == length(probs)
    return map(1:size(sequences, 1)) do i
        return prod(
            s_i ? p_i : (1 - p_i) for (s_i, p_i) in zip(sequences[i, :], probs)
        )
    end
end

function solve_complementarity(param1, param2, shock, probs, w, i)
    model = Model(PATHSolver.Optimizer)
    set_silent(model)
    @variable(model, x[1:J] >= 0, start = 0.5)
    A = 1 .- shock .* (1 - param2)
    b = -(1 - param1) * w[i]
    @constraint(model, c[j in 1:J],
        b * sum(probs[s] * A[s, j] / (A[s, :]' * x) for s in 1:2^J) + w[j] ⟂ x[j]
    )
    optimize!(model)
    return value.(x)
end

J = 10
param1 = 0.5
param2 = 0.1
prob = fill(0.5, J)
shock = possible_shock_combinations(J)
probs = shock_probabilities(shock, prob)
w = ones(J)
i = 1
x = solve_complementarity(param1, param2, shock, probs, w, i)

The problematic bit is

sum(probs[s] * A[s, j] / (A[s, :]' * x) for s in 1:2^J)

For J = 30, this is a sum with 1 billion elements. And each element has the form a / b' * x, so this is an expression with something like (2 * J) * 2^J nodes in the expression graph. And even once you have build it, JuMP has to compute derivatives of it. JuMP isn’t really built for expressions of this size.

The work-around is usually to write a user-defined function, but PATHSolver doesn’t support them yet (the nonlinear interface is still pretty new). I’ve opened an issue: Support MOI.UserDefinedFunction · Issue #93 · chkwon/PATHSolver.jl · GitHub

Thanks!! This is awesome. I have two follow ups:

  1. the derivatives are very easy to compute. However, I guess one cannot do this until I can pass User-Defined functions to PATH. Am I understanding correctly?
  2. Memory issues aside, is it possible to parallelize this over i (in an outer loop)?

I saw somewhere that Path is not threads stable, but with Distributed and sync?

1 Like

the derivatives are very easy to compute. However, I guess one cannot do this until I can pass User-Defined functions to PATH. Am I understanding correctly?

Correct. With a user-defined function, you can pass a function not compute the value and a second function to compute the gradient. I’ve got something working locally, will push soon.

Memory issues aside, is it possible to parallelize this over i (in an outer loop)?

No. In general you cannot parallelize the building of a JuMP model.

See Add support for MOI.UserDefinedFunction by odow · Pull Request #95 · chkwon/PATHSolver.jl · GitHub

@odow sorry for keep bothering. Do you find big performance gains from using this function? I find that it improves my function just in around one second with J=20.

What is around 1.3x faster is your initial solution here

I didn’t try the performance gains. But i would expect most of the difference would come from the reduction in build time.

Do you have a reproducible example of the code you are using? There might be some general Julia performance issues.

Yes, sure.

See the code below:

using JuMP, Combinatorics, BitOperations, BenchmarkTools
import PATHSolver

function possible_shock_combinations(J)
    return [BitOperations.bget(j, i) for j in 0:(2^J-1), i in 0:(J-1)]
end

function joint_probability(p, outcomes)
    # Check if the lengths of p and outcomes are the same
    if length(p) != length(outcomes)
        println("The lengths of probabilities and outcomes must be the same.")
    end
    
    prob = 1.0
    for i in 1:length(outcomes)
        if outcomes[i] == 1
            prob *= p[i]
        else
            prob *= (1 - p[i])
        end
    end
    return prob
end

function shockprobs(sequences,probs)
    (KK, K) = size(sequences)
    proba = zeros(Float64, KK)
    for i = 1:KK
        proba[i] = joint_probability(probs, sequences[i, :])
    end
    return proba
end

function solve_complementarity1(param1, param2, shock, probs, w, i)

    model = Model(PATHSolver.Optimizer)
    set_silent(model)
    @variable(model, x[1:J] >= 0, start = 0.5)
    @constraints(
        model,
        begin          
            c[j in 1:J], (-(1-param1)*w[i]*sum(probs[s]*(1-shock[s,j]*(1-param2))*sum(x[l]*(1 -shock[s,l]*(1-param2)) for l in 1: J)^(-1) for s in 1:2^J) + w[j])⟂ x[j] 
        end)
    JuMP.optimize!(model)
    return value.(x)
end

function solve_complementarity2(param1, param2, shock, probs, w, i)
    model = Model(PATHSolver.Optimizer)
    set_silent(model)
    @variable(model, x[1:J] >= 0, start = 0.5)
    A = 1 .- shock .* (1 - param2)
    b = -(1 - param1) * w[i]
    @constraint(model, c[j in 1:J],
        b * sum(probs[s] * A[s, j] / (A[s, :]' * x) for s in 1:2^J) + w[j] ⟂ x[j]
    )
    optimize!(model)
    return value.(x)
end

function solve_complementarity3(param1, param2, shock, probs, w, i)
    model = Model(PATHSolver.Optimizer)
    set_silent(model)
    @variable(model, x[1:J] >= 0, start = 0.5)
    A = 1 .- shock .* (1 - param2)
    function f(x_j, x...)
        j = round(Int, x_j)
        return sum(probs[s] * A[s, j] / sum(A[s, k] * x[k] for k in 1:J) for s in 1:2^J)
    end
    function 𝝯f(g, x_j, x...)
        j = round(Int, x_j)
        for s in 1:2^J
            temp = -probs[s] * A[s, j] / sum(A[s, k] * x[k] for k in 1:J)^2
            for k in 1:J
                g[1+k] += temp * A[s, k]
            end
        end
        return
    end
    @operator(model, op_f, 1 + J, f, 𝝯f)
    @constraint(model, c[j in 1:J], -(1 - param1) * w[i] * op_f(j, x...) + w[j] ⟂ x[j])
    optimize!(model)
    return value.(x)
end

begin
    J = 17
    param1 = 0.5
    param2 = 0.1
    prob = fill(0.5, J)
    shock = possible_shock_combinations(J)
    probs = shockprobs(shock, prob)
    w = ones(J)
    i = 1
    x = @btime  solve_complementarity1(param1, param2, shock, probs, w, i);
    y = @btime  solve_complementarity2(param1, param2, shock, probs, w, i);
    z = @btime solve_complementarity3(param1, param2, shock, probs, w, i);
end

In my machine, I get the following performance:

 71.404 s (1192063614 allocations: 54.14 GiB)
 57.205 s (938037154 allocations: 47.21 GiB)
 61.217 s (2361773932 allocations: 54.80 GiB)

The issue is that J is a global variable that is not passed as an argument to the functions. Fixing that makes a big difference.

The next improvement is a little more complicated: you can cache the result of A * x so that you don’t need to repeatedly calculate it. This has a biiiig difference:

import LinearAlgebra
function solve_complementarity4(J, param1, param2, shock, probs, w, i)
    model = Model(PATHSolver.Optimizer)
    set_silent(model)
    @variable(model, x[1:J] >= 0, start = 0.5)
    A = 1 .- shock .* (1 - param2)
    cache, last_x = zeros(2^J), nothing
    function f(x_j, x...)
        j = round(Int, x_j)
        if x != last_x
            LinearAlgebra.mul!(cache, A, collect(x))
            last_x = x
        end
        return sum(probs[s] * A[s, j] / cache[s] for s in 1:2^J)
    end
    function 𝝯f(g, x_j, x...)
        j = round(Int, x_j)
        if x != last_x
            LinearAlgebra.mul!(cache, A, collect(x))
            last_x = x
        end
        for s in 1:2^J
            temp = -probs[s] * A[s, j] / cache[s]^2
            for k in 1:J
                g[1+k] += temp * A[s, k]
            end
        end
        return
    end
    @operator(model, op_f, 1 + J, f, 𝝯f)
    @constraint(model, c[j in 1:J], -(1 - param1) * w[i] * op_f(j, x...) + w[j] ⟂ x[j])
    optimize!(model)
    return value.(x)
end
1 Like

This is incredible! The performance gains are massive.

1 Like

@odow
Can you explain a little bit the logic?

1 Like

Computing the cache is expensive, because A has 1,000,000,000 rows.

Without the cache, you need to compute it twice for each x (in f and 𝝯f), …and… J times for each of the constraints.

So we’re actually computing sum(A[s, k] * x[k] for k in 1:J) once for every J, once for every 2, and twice for f and 𝝯f, which is J * (2^J) * 2.

For the cost of storing the cache vector, we can simplify things down by 2J, and then instead of looping over s, we can do a single matrix multiply, which might be faster.

Have you tried solving this problem, e.g., with GAMS? I’d be interested to know how JuMP compares.

I did not really follow this!

No I haven’t! But I can try! Is it open software?

I did not really follow this!

So the question is: how many times do we compute sum(A[s, k] * x[k] for k in 1:J).

  • It appears once in f (for every s)
  • It appears once in 𝝯f (for every s)
  • op_f appears in J constraints

So if PATH gives us a new vector x with a candidate solution, we end up evaluating the sum(A[s, k] * x[k] for k in 1:J) expression 2 * J times (let’s ignore the s for now).

By using the cache, we need only to evaluate it once. So if this summation is a bottleneck in the code, then changing to the cache is 40x faster (when J = 20).

The sum is slow to compute, because we actually need to do it 2^J times (once for each s. We could loop over s and store the values in a vector, or we could observe that we’re really computing the matrix-vector multiply A * x.

Is it open software?

No, you need a license. It is the software most commonly used for complementarity problems: https://www.gams.com.

1 Like

@odow any idea of why I might be getting the following error when implementing this function on a Linux High Performance Server.

Closest candidates are:
  convert(::Type{Array{T, N}}, !Matched::StaticArraysCore.SizedArray{S, T, N, N, Array{T, N}}) where {S, T, N}
   @ StaticArrays ~/.julia/packages/StaticArrays/oOCPP/src/SizedArray.jl:88
  convert(::Type{Array{T, N}}, !Matched::StaticArraysCore.SizedArray{S, T, N, M, TData} where {M, TData<:AbstractArray{T, M}}) where {T, S, N}
   @ StaticArrays ~/.julia/packages/StaticArrays/oOCPP/src/SizedArray.jl:82
  convert(::Type{T}, !Matched::AbstractArray) where T<:Array
   @ Base array.jl:613
  ...

Stacktrace:
 [1] PATHSolver.Solution(status::PATHSolver.MCP_Termination, x::Nothing, info::Nothing)
   @ PATHSolver ~/.julia/packages/PATHSolver/QQzLy/src/MOI_wrapper.jl:357
 [2] optimize!(model::MathOptInterface.Utilities.GenericOptimizer{Float64, MathOptInterface.Utilities.ObjectiveContainer{Float64}, MathOptInterface.Utilities.VariablesContainer{Float64}, PATHSolver.OptimizerFunctionConstraints{Float64}})
   @ PATHSolver ~/.julia/packages/PATHSolver/QQzLy/src/MOI_wrapper.jl:390
 [3] optimize!
   @ ~/.julia/packages/MathOptInterface/DDWnF/src/Bridges/bridge_optimizer.jl:380 [inlined]
 [4] optimize!
   @ ~/.julia/packages/MathOptInterface/DDWnF/src/MathOptInterface.jl:85 [inlined]
 [5] optimize!(m::MathOptInterface.Utilities.CachingOptimizer{MathOptInterface.Bridges.LazyBridgeOptimizer{MathOptInterface.Utilities.GenericOptimizer{Float64, MathOptInterface.Utilities.ObjectiveContainer{Float64}, MathOptInterface.Utilities.VariablesContainer{Float64}, PATHSolver.OptimizerFunctionConstraints{Float64}}}, MathOptInterface.Utilities.UniversalFallback{MathOptInterface.Utilities.Model{Float64}}})
   @ MathOptInterface.Utilities ~/.julia/packages/MathOptInterface/DDWnF/src/Utilities/cachingoptimizer.jl:316
 [6] optimize!(model::Model; ignore_optimize_hook::Bool, _differentiation_backend::MathOptInterface.Nonlinear.SparseReverseMode, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
   @ JuMP ~/.julia/packages/JuMP/027Gt/src/optimizer_interface.jl:448
 [7] optimize!
   @ ~/.julia/packages/JuMP/027Gt/src/optimizer_interface.jl:409 [inlined]
 [8] firm_solution_simul(J::Int64, L::Vector{Float64}, S::Int64, β::Float64, σ::Float64, χ::Float64, shocks::BitMatrix, w::Vector{Float64}, z::Vector{Float64}, Mat::Matrix{Float64}, i::Int64)
   @ Main /scratch/gpfs/folder/model.jl:1492
 [9] top-level scope
   @ /scratch/gpfs/folder/model.jl:1667
in expression starting at /scratch/gpfs/folder/ model.jl:1667
srun: error: della-h12n2: task 0: Exited with exit code 1
srun: Terminating StepId=53909708.0

What is the first part of the error message? (There should be a few lines before “Closest candidates are”)

Do you have a reproducible example?