# Computing variance of variables is very slow in JuMP

Consider the small code snippet below

``````using JuMP
using Statistics

model = Model()
n = 1000
@variable(model, x[1:n], Bin)
@time @objective(model, Min, var(x .* rand(n)))

``````

The timing is `162.785950 seconds (93.08 k allocations: 53.536 GiB, 3.53% gc time)`.

Is this speed expected? If not, how shall I improve it? Any suggestion is appreciated.

``````julia> versioninfo()
Julia Version 1.10.3
Commit 0b4590a550 (2024-04-30 10:59 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: 20 Ă— 12th Gen Intel(R) Core(TM) i7-12700F
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-15.0.7 (ORCJIT, alderlake)
Threads: 20 default, 0 interactive, 10 GC (on 20 virtual cores)
``````

The call to `Statistics.var` creates a dense quadratic matrix, so with `n = 1_000` there are 500,500 terms in your objective function.

``````julia> using JuMP, Statistics

julia> function main(n)
model = Model()
@variable(model, x[1:n], Bin)
@objective(model, Min, var(x .* rand(n)))
return model
end
main (generic function with 1 method)

julia> function my_main(n)
model = Model()
@variable(model, x[1:n], Bin)
@variable(model, ÎĽ)
@expression(model, y, x .* rand(n))
@constraint(model, ÎĽ == sum(y) / n)
@objective(model, Min, sum((yi - ÎĽ)^2 for yi in y))
return model
end
my_main (generic function with 1 method)

julia> @time main(200);
1.320286 seconds (16.76 k allocations: 598.565 MiB, 4.88% gc time)

julia> @time my_main(200);
0.000551 seconds (7.21 k allocations: 765.625 KiB)
``````
1 Like

Many thanks. The speedup is excellent. Could you please explain more about â€śthe call to `Statistics.var` creates a dense quadratic matrixâ€ť? Did you mean that the dense matrix memory allocation is expensive? But the memory itself should not be that expensive.

When dealing with numbers:

``````julia> using Statistics

julia> x = rand(1000)

julia> @allocated var(x)
16
``````

The other issue is just the sheer number of operations that happen. The default implementation in `Statistics.var` does not exploit common subexpressions. So itâ€™s an O(n^3) operation. (There O(n^2) to compute each term, looped over yi in y for the extra `n`)

``````using JuMP, Statistics
function main(n)
model = Model()
@variable(model, x[1:n])
return var(x)
end
main(10);
x = [10, 25, 50, 75, 100, 125, 150, 175, 200, 225, 250]
y = [(@elapsed main(n)) for n in x]
a = Statistics.mean(y ./ x.^3)
Plots.plot(x, y)
Plots.plot!(x, a * x.^3)
``````

1 Like