Computing variance of variables is very slow in JuMP

Consider the small code snippet below

using JuMP
using Statistics

model = Model()
n = 1000
@variable(model, x[1:n], Bin)
@time @objective(model, Min, var(x .* rand(n)))

The timing is 162.785950 seconds (93.08 k allocations: 53.536 GiB, 3.53% gc time).

Is this speed expected? If not, how shall I improve it? Any suggestion is appreciated.

julia> versioninfo()
Julia Version 1.10.3
Commit 0b4590a550 (2024-04-30 10:59 UTC)
Build Info:
  Official release
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: 20 × 12th Gen Intel(R) Core(TM) i7-12700F
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, alderlake)
Threads: 20 default, 0 interactive, 10 GC (on 20 virtual cores)

The call to Statistics.var creates a dense quadratic matrix, so with n = 1_000 there are 500,500 terms in your objective function.

Do instead:

julia> using JuMP, Statistics

julia> function main(n)
           model = Model()
           @variable(model, x[1:n], Bin)
           @objective(model, Min, var(x .* rand(n)))
           return model
main (generic function with 1 method)

julia> function my_main(n)
           model = Model()
           @variable(model, x[1:n], Bin)
           @variable(model, μ)
           @expression(model, y, x .* rand(n))
           @constraint(model, μ == sum(y) / n)
           @objective(model, Min, sum((yi - μ)^2 for yi in y))
           return model
my_main (generic function with 1 method)

julia> @time main(200);
  1.320286 seconds (16.76 k allocations: 598.565 MiB, 4.88% gc time)

julia> @time my_main(200);
  0.000551 seconds (7.21 k allocations: 765.625 KiB)
1 Like

Many thanks. The speedup is excellent. Could you please explain more about “the call to Statistics.var creates a dense quadratic matrix”? Did you mean that the dense matrix memory allocation is expensive? But the memory itself should not be that expensive.

When dealing with numbers:

julia> using Statistics

julia> x = rand(1000)

julia> @allocated var(x)

The other issue is just the sheer number of operations that happen. The default implementation in Statistics.var does not exploit common subexpressions. So it’s an O(n^3) operation. (There O(n^2) to compute each term, looped over yi in y for the extra n)

using JuMP, Statistics
function main(n)
    model = Model()
    @variable(model, x[1:n])
    return var(x)
x = [10, 25, 50, 75, 100, 125, 150, 175, 200, 225, 250]
y = [(@elapsed main(n)) for n in x]
a = Statistics.mean(y ./ x.^3)
Plots.plot(x, y)
Plots.plot!(x, a * x.^3)


1 Like