# Help with Performance in Large Mixed Complementarity Problem

Hi!

I have a complementarity problem in which I am using the new awesome additions to JuMP to solve it. The problem involves a summation over 2^J terms, where J is around 30 and performance is critical.

I was wondering if I could get any recommendations on how to improve the performance of the code either on the proper JuMP function and on how to handle such a large problem.

Find a MWE below:

``````using JuMP, Combinatorics, BitOperations, BenchmarkTools
import PATHSolver

const J = 10
const param1 = 0.5
const param2 = 0.1
const prob = 0.5.*ones(J)

function possible_shock_combinations(J)
ShockSet = Array{Union{Nothing,Int}}(nothing, 2^J, J)
let int_s = fill(false, J) #pre-allocate output
for j = 0:2^J-1
@. int_s = bget(j, 0:J-1)
ShockSet[j+1, :] = int_s
end
end
return convert(Matrix{Int},ShockSet)
end

function joint_probability(p::Vector{Float64}, outcomes::Vector{Int})
# Check if the lengths of p and outcomes are the same
if length(p) != length(outcomes)
println("The lengths of probabilities and outcomes must be the same.")
end

prob = 1.0
for i in 1:length(outcomes)
if outcomes[i] == 1
prob *= p[i]
else
prob *= (1 - p[i])
end
end
return prob
end

function shockprobs(sequences,probs)
(KK, K) = size(sequences)
proba = zeros(Float64, KK)
for i = 1:KK
proba[i] = joint_probability(probs, sequences[i, :])
end
return proba
end

const shocks = possible_shock_combinations(J)
probs = shockprobs(shocks,prob) #probability of each event#

function Comp(param1, param2, shock, probs, w, i)

model = Model(PATHSolver.Optimizer)
set_silent(model)
@variable(model, x[1:J] >= 0, start = 0.5)
@constraints(
model,
begin
c[j in 1:J], (-(1-param1)*w[i]*sum(probs[s]*(1-shock[s,j]*(1-param2))*sum(x[l]*(1 -shock[s,l]*(1-param2)) for l in 1: J)^(-1) for s in 1:2^J) + w[j])⟂ x[j]
end)
JuMP.optimize!(model)
return value.(x)
end

W = ones(J)

test = @btime Comp(param1, param2, shocks, probs, W, 1)
``````
1 Like

Hi @jmc, welcome to the forum.

I’ll take a look at your model, but as an initial comment: 2^30 is ~1 billion. Does it make sense to model all of those possible shocks? The problem is slow to build because there will be a lot of terms in your expressions.

Hi @odow, thanks for the welcoming and for all your great work!

Yes, sorry J=30 might be a stretch. Let’s say J = 25 or J = 20. Regardless of the value of J, any potential way to improve the performance is very welcomed!!

1 Like

With a bit of re-arranging, I’ve simplified your problem down to:

``````using JuMP
import BitOperations
import PATHSolver

function possible_shock_combinations(J)
return [BitOperations.bget(j, i) for j in 0:(2^J-1), i in 0:(J-1)]
end

function shock_probabilities(sequences, probs)
@assert size(sequences, 2) == length(probs)
return map(1:size(sequences, 1)) do i
return prod(
s_i ? p_i : (1 - p_i) for (s_i, p_i) in zip(sequences[i, :], probs)
)
end
end

function solve_complementarity(param1, param2, shock, probs, w, i)
model = Model(PATHSolver.Optimizer)
set_silent(model)
@variable(model, x[1:J] >= 0, start = 0.5)
A = 1 .- shock .* (1 - param2)
b = -(1 - param1) * w[i]
@constraint(model, c[j in 1:J],
b * sum(probs[s] * A[s, j] / (A[s, :]' * x) for s in 1:2^J) + w[j] ⟂ x[j]
)
optimize!(model)
return value.(x)
end

J = 10
param1 = 0.5
param2 = 0.1
prob = fill(0.5, J)
shock = possible_shock_combinations(J)
probs = shock_probabilities(shock, prob)
w = ones(J)
i = 1
x = solve_complementarity(param1, param2, shock, probs, w, i)
``````

The problematic bit is

``````sum(probs[s] * A[s, j] / (A[s, :]' * x) for s in 1:2^J)
``````

For `J = 30`, this is a sum with 1 billion elements. And each element has the form `a / b' * x`, so this is an expression with something like `(2 * J) * 2^J` nodes in the expression graph. And even once you have build it, JuMP has to compute derivatives of it. JuMP isn’t really built for expressions of this size.

The work-around is usually to write a user-defined function, but PATHSolver doesn’t support them yet (the nonlinear interface is still pretty new). I’ve opened an issue: Support MOI.UserDefinedFunction · Issue #93 · chkwon/PATHSolver.jl · GitHub

Thanks!! This is awesome. I have two follow ups:

1. the derivatives are very easy to compute. However, I guess one cannot do this until I can pass User-Defined functions to PATH. Am I understanding correctly?
2. Memory issues aside, is it possible to parallelize this over i (in an outer loop)?

I saw somewhere that Path is not threads stable, but with Distributed and sync?

1 Like

the derivatives are very easy to compute. However, I guess one cannot do this until I can pass User-Defined functions to PATH. Am I understanding correctly?

Correct. With a user-defined function, you can pass a function not compute the value and a second function to compute the gradient. I’ve got something working locally, will push soon.

Memory issues aside, is it possible to parallelize this over i (in an outer loop)?

No. In general you cannot parallelize the building of a JuMP model.

@odow sorry for keep bothering. Do you find big performance gains from using this function? I find that it improves my function just in around one second with J=20.

What is around 1.3x faster is your initial solution here

I didn’t try the performance gains. But i would expect most of the difference would come from the reduction in build time.

Do you have a reproducible example of the code you are using? There might be some general Julia performance issues.

Yes, sure.

See the code below:

``````using JuMP, Combinatorics, BitOperations, BenchmarkTools
import PATHSolver

function possible_shock_combinations(J)
return [BitOperations.bget(j, i) for j in 0:(2^J-1), i in 0:(J-1)]
end

function joint_probability(p, outcomes)
# Check if the lengths of p and outcomes are the same
if length(p) != length(outcomes)
println("The lengths of probabilities and outcomes must be the same.")
end

prob = 1.0
for i in 1:length(outcomes)
if outcomes[i] == 1
prob *= p[i]
else
prob *= (1 - p[i])
end
end
return prob
end

function shockprobs(sequences,probs)
(KK, K) = size(sequences)
proba = zeros(Float64, KK)
for i = 1:KK
proba[i] = joint_probability(probs, sequences[i, :])
end
return proba
end

function solve_complementarity1(param1, param2, shock, probs, w, i)

model = Model(PATHSolver.Optimizer)
set_silent(model)
@variable(model, x[1:J] >= 0, start = 0.5)
@constraints(
model,
begin
c[j in 1:J], (-(1-param1)*w[i]*sum(probs[s]*(1-shock[s,j]*(1-param2))*sum(x[l]*(1 -shock[s,l]*(1-param2)) for l in 1: J)^(-1) for s in 1:2^J) + w[j])⟂ x[j]
end)
JuMP.optimize!(model)
return value.(x)
end

function solve_complementarity2(param1, param2, shock, probs, w, i)
model = Model(PATHSolver.Optimizer)
set_silent(model)
@variable(model, x[1:J] >= 0, start = 0.5)
A = 1 .- shock .* (1 - param2)
b = -(1 - param1) * w[i]
@constraint(model, c[j in 1:J],
b * sum(probs[s] * A[s, j] / (A[s, :]' * x) for s in 1:2^J) + w[j] ⟂ x[j]
)
optimize!(model)
return value.(x)
end

function solve_complementarity3(param1, param2, shock, probs, w, i)
model = Model(PATHSolver.Optimizer)
set_silent(model)
@variable(model, x[1:J] >= 0, start = 0.5)
A = 1 .- shock .* (1 - param2)
function f(x_j, x...)
j = round(Int, x_j)
return sum(probs[s] * A[s, j] / sum(A[s, k] * x[k] for k in 1:J) for s in 1:2^J)
end
function 𝝯f(g, x_j, x...)
j = round(Int, x_j)
for s in 1:2^J
temp = -probs[s] * A[s, j] / sum(A[s, k] * x[k] for k in 1:J)^2
for k in 1:J
g[1+k] += temp * A[s, k]
end
end
return
end
@operator(model, op_f, 1 + J, f, 𝝯f)
@constraint(model, c[j in 1:J], -(1 - param1) * w[i] * op_f(j, x...) + w[j] ⟂ x[j])
optimize!(model)
return value.(x)
end

begin
J = 17
param1 = 0.5
param2 = 0.1
prob = fill(0.5, J)
shock = possible_shock_combinations(J)
probs = shockprobs(shock, prob)
w = ones(J)
i = 1
x = @btime  solve_complementarity1(param1, param2, shock, probs, w, i);
y = @btime  solve_complementarity2(param1, param2, shock, probs, w, i);
z = @btime solve_complementarity3(param1, param2, shock, probs, w, i);
end
``````

In my machine, I get the following performance:

`````` 71.404 s (1192063614 allocations: 54.14 GiB)
57.205 s (938037154 allocations: 47.21 GiB)
61.217 s (2361773932 allocations: 54.80 GiB)
``````

The issue is that `J` is a global variable that is not passed as an argument to the functions. Fixing that makes a big difference.

The next improvement is a little more complicated: you can cache the result of `A * x` so that you don’t need to repeatedly calculate it. This has a biiiig difference:

``````import LinearAlgebra
function solve_complementarity4(J, param1, param2, shock, probs, w, i)
model = Model(PATHSolver.Optimizer)
set_silent(model)
@variable(model, x[1:J] >= 0, start = 0.5)
A = 1 .- shock .* (1 - param2)
cache, last_x = zeros(2^J), nothing
function f(x_j, x...)
j = round(Int, x_j)
if x != last_x
LinearAlgebra.mul!(cache, A, collect(x))
last_x = x
end
return sum(probs[s] * A[s, j] / cache[s] for s in 1:2^J)
end
function 𝝯f(g, x_j, x...)
j = round(Int, x_j)
if x != last_x
LinearAlgebra.mul!(cache, A, collect(x))
last_x = x
end
for s in 1:2^J
temp = -probs[s] * A[s, j] / cache[s]^2
for k in 1:J
g[1+k] += temp * A[s, k]
end
end
return
end
@operator(model, op_f, 1 + J, f, 𝝯f)
@constraint(model, c[j in 1:J], -(1 - param1) * w[i] * op_f(j, x...) + w[j] ⟂ x[j])
optimize!(model)
return value.(x)
end
``````

This is incredible! The performance gains are massive.

1 Like

@odow
Can you explain a little bit the logic?

1 Like

Computing the `cache` is expensive, because `A` has 1,000,000,000 rows.

Without the cache, you need to compute it twice for each `x` (in `f` and `𝝯f`), …and… `J` times for each of the constraints.

So we’re actually computing `sum(A[s, k] * x[k] for k in 1:J)` once for every J, once for every `2`, and twice for `f` and `𝝯f`, which is `J * (2^J) * 2`.

For the cost of storing the `cache` vector, we can simplify things down by `2J`, and then instead of looping over `s`, we can do a single matrix multiply, which might be faster.

Have you tried solving this problem, e.g., with GAMS? I’d be interested to know how JuMP compares.

I did not really follow this!

No I haven’t! But I can try! Is it open software?

I did not really follow this!

So the question is: how many times do we compute `sum(A[s, k] * x[k] for k in 1:J)`.

• It appears once in `f` (for every `s`)
• It appears once in `𝝯f` (for every `s`)
• `op_f` appears in `J` constraints

So if PATH gives us a new vector `x` with a candidate solution, we end up evaluating the `sum(A[s, k] * x[k] for k in 1:J)` expression `2 * J` times (let’s ignore the `s` for now).

By using the `cache`, we need only to evaluate it once. So if this summation is a bottleneck in the code, then changing to the cache is 40x faster (when J = 20).

The sum is slow to compute, because we actually need to do it `2^J` times (once for each `s`. We could loop over `s` and store the values in a vector, or we could observe that we’re really computing the matrix-vector multiply `A * x`.

Is it open software?

No, you need a license. It is the software most commonly used for complementarity problems: https://www.gams.com.

1 Like