JuMP Container Benchmark

Hello, we are Julia beginners. We want to use Julia for solving linear programs using JuMP and some external solver. For very large LP problems the model generation time can be significant.

Since JuMP covers many different options to store variables, constraints and coefficients of a model we found it important to know the speed and memory usage of the different options.

If you are experienced with Julia/JuMP, the optimization Packages (CPLEX, Gurobi, Mosek, Xpress) or with applied linear programming in general we would be very thankful if you could check the results and address the open questions(see below).

We mainly tried different container types:

  • Normal Julia arrays with one based indexing

  • JuMP DenseAxisArrays which allow string indexing

  • Normal Julia dictionaries

  • Dictionaries.jl

  • NamedDims.jl

  • and also experimented with the direct_mode option.

The containers were applied to:

  • the JuMP variables ,

  • JuMP constraints

  • coefficients


We used Julia v1.5.1 and JuMP v0.21.3 and CPLEX.jl v0.6.6 The same Data is used for all tests. We built a simple linear example model with 1000 objective variables x[1:1000] and 1000 constrains and want to maximize the objective function sum(x).

You can find the code we used to create the benchmarks at: “https://github.com/SonntagK/JuMPContainerBenchmarks

There you can also find the results in the folder called Results as .xlsx files:
https://github.com/SonntagK/JuMPContainerBenchmarks/tree/main/Results




Here is a part of the results. The memory estimate, median time and number of samples is obtained using the BenchmarkTools.jl. Everything we wrote is wrapped in functions and we used the second benchmarking results as recommended. In the end you can find an example how I create the data and build the model. In the code I used for the benchmark results everything is seperated into modules.

Surprisingly JuMP DenseAxisArrays were much faster than normal one based indexing Julia arrays. We are happy (since we like string indexing), but wonder if there is some problem in our use or the implementation of the normal array container.

From reading JuMP presentations, we expected that using the direct mode will speedup the model generation. But we could not observe this. Is this correct ??

Possibly using some containers (e.g. string based indexing with 20 random characters) may slow down the speed of the solvers. Therefore, the benchmark of the model only generation is compared to the benchmark of model generation + solving the optimization problem. We did not find any significant impact of the container on the solver speed.

Summary:

  • Quesition1: Why are Arrays so slow compared to other container types, espescially compared to DenseAxisArrays?

  • Question2: The direct_mode option does not help. Why? Are our models to small to see the impact?

For any further remarks or recommendations We would be very thankful.

Example for data creation, model build and optimization for denseAxisArray:

using JuMP, CPLEX
# create random data
n = 1000
arrayCoeff = rand(n,n)
arrayBound = rand(n)


#transform Data to denseAxisArray
coeff = JuMP.containers.denseAxisArray(arrayCoeff,1:n,1:n)
bound = JuMP.container.desnseAxisArray(arrayBound,1:n)

#model build
m = Model()
set_optimizer(m,CPLEX.Optimizer)
@variable(m, 0<=x[1:n]<=1, container = DenseAxisArray)
@constraint(m, con[i = 1:n], sum(coeff[i,j]*x[j] for j in 1:n) <= bound[i], container = DenseAxisArray)
@objective(m,Max,sum(x))

#optimization
JuMP.optimize!(m) 


Kind Regards

Konstantin

edit1: Added link to results, added jpg of results and explanaition how the numbers arise
edit2: Added an example

1 Like

Quesition1: Why are Arrays so slow compared to other container types, espescially compared to DenseAxisArrays?

This is a very good question.
I think something weird must be happening.

In particular, as the primary maintainer of NamedDims.jl I know it can’t be faster than Array.
It’s core tenent is to have zero runtime overhead beyond the same operations on Array.
But it can’t be faster – it delegates all operations to Array after doing some extra things (that compile away).

Something spooky is happening

2 Likes

Presumably, you are benchmarking in global scope. Take a read of the performance tips: https://docs.julialang.org/en/v1/manual/performance-tips/index.html.

Direct mode may lead to a modest speed up in creation time. The bigger reason to use it is that it should halve the memory usage, because we only create one copy of the constraint matrix (just inside the solver), instead of two (one inside JuMP, one inside the solver).

Edit: I looked at your benchmarking code. It was quite convoluted, so I don’t think it is measuring what you think it is measuring. Here is a simpler example, showing DenseAxisArray’s being slower than Array. However, I would encourage you to use the most readable data-structure. The time taken to build the model is almost never the bottleneck compared to actually solving.

julia> using BenchmarkTools

julia> using JuMP

julia> function bench_array(n)
           A = rand(n, n)
           b = rand(n)
           model = Model()
           set_silent(model)
           @variable(model, 0 <= x[1:n] <= 1)
           @constraint(model, con[i = 1:n], sum(A[i, j] * x[j] for j = 1:n) <= b[i])
           @objective(model, Max, sum(x))
           return
       end
bench_array (generic function with 1 method)

julia> function bench_axis(n)
           indices = 1:n
           A = Containers.DenseAxisArray(rand(n, n), indices, indices)
           b = Containers.DenseAxisArray(rand(n), indices)
           model = Model()
           set_silent(model)
           @variable(model, 0 <= x[indices] <= 1)
           @constraint(model, con[i = indices], sum(A[i, j] * x[j] for j = 1:n) <= b[i])
           @objective(model, Max, sum(x))
           return
       end
bench_axis (generic function with 1 method)

julia> n = 1_000
1000

julia> @benchmark bench_array($n)
BenchmarkTools.Trial: 
  memory estimate:  108.63 MiB
  allocs estimate:  48306
  --------------
  minimum time:     118.906 ms (7.10% GC)
  median time:      132.233 ms (11.31% GC)
  mean time:        132.017 ms (10.73% GC)
  maximum time:     143.646 ms (10.39% GC)
  --------------
  samples:          38
  evals/sample:     1

julia> @benchmark bench_axis($n)
BenchmarkTools.Trial: 
  memory estimate:  109.08 MiB
  allocs estimate:  48405
  --------------
  minimum time:     168.844 ms (7.40% GC)
  median time:      176.237 ms (7.37% GC)
  mean time:        177.120 ms (7.82% GC)
  maximum time:     192.343 ms (9.70% GC)
  --------------
  samples:          29
  evals/sample:     1
3 Likes

We managed to find the mistake. When defining the constraints the code was written as

@constraint(m, con[i = 1:n], sum(coeff[i,j].*x[j] for j = 1:n) <= bound[i] )

instead of

@constraint(m, con[i = 1:n], sum(coeff[i,j]*x[j] for j = 1:n) <= bound[i] )

which lead to a significant increase in model building time.

With this change the results are more reasonable and the array container is way faster:

2 Likes