JuMP, excessive memory usage?

question

#1

Hi all,

I have a question about the memory requirements of the following toy example, which is a simplified versions of my real problem:

using JuMP
using Ipopt

N = 29

p1 = rand(N+1,N+1,N+1)
p2 = rand(N+1,N+1,N+1)
p3 = rand(N+1,N+1,N+1)

h = 1.0/N

m = Model()

@variable(m, u1[1:(N+1),1:(N+1),1:(N+1)])
@variable(m, u2[1:(N+1),1:(N+1),1:(N+1)])

@NLobjective(m, Min, sum{ ( ( h * (30*u1[i,j,k]^2 - 20*u1[i,j,k]*u1[i,j+1,k] - 20*u1[i,j,k]*u1[i,j,k+1] - 10*u1[i+1,j,k]*u1[i,j+1,k] - 10*u1[i+1,j,k]*u1[i,j,k+1] - 10*u1[i,j+1,k]*u1[i,j,k+1] - 20*u2[i,j,k]*u2[i+1,j,k] - 20*u2[i,j,k]*u2[i,j+1,k] - 20*u2[i,j,k]*u2[i,j,k+1] - 10*u2[i+1,j,k]*u2[i,j+1,k] - 10*u2[i+1,j,k]*u2[i,j,k+1] - 10*u2[i,j+1,k]*u2[i,j,k+1] - 20*u1[i+1,j,k]*u1[i+1,j+1,k] - 20*u1[i+1,j,k]*u1[i+1,j,k+1] - 20*u1[i,j+1,k]*u1[i+1,j+1,k] - 20*u1[i,j+1,k]*u1[i,j+1,k+1] - 20*u1[i,j,k+1]*u1[i+1,j,k+1] - 20*u1[i,j,k+1]*u1[i,j+1,k+1] - 20*u2[i+1,j,k]*u2[i+1,j+1,k] - 20*u2[i+1,j,k]*u2[i+1,j,k+1] - 20*u2[i,j+1,k]*u2[i+1,j+1,k] - 20*u2[i,j+1,k]*u2[i,j+1,k+1] - 20*u2[i,j,k+1]*u2[i+1,j,k+1] + 5*p2[i,j,k]*u2[i,j+1,k]*u1[i,j+1,k+1] - 5*p2[i,j,k]*u2[i,j,k+1]*u1[i+1,j,k+1] + 10*p2[i,j,k]*u2[i,j,k+1]*u1[i,j+1,k+1] - 5*p3[i,j,k]*u2[i+1,j,k]*u1[i+1,j+1,k] - 10*p3[i,j,k]*u2[i+1,j,k]*u1[i+1,j,k+1] - 5*p3[i,j,k]*u2[i,j+1,k]*u1[i+1,j+1,k] + 10*p3[i,j,k]*u2[i,j+1,k]*u1[i,j+1,k+1] - 5*p3[i,j,k]*u2[i,j,k+1]*u1[i+1,j,k+1] + 5*p3[i,j,k]*u2[i,j,k+1]*u1[i,j+1,k+1] - 5*p1[i,j,k]*u1[i+1,j,k]*u2[i+1,j+1,k] - 5*p1[i,j,k]*u1[i+1,j,k]*u2[i+1,j,k+1] - 10*p1[i,j,k]*u1[i,j+1,k]*u2[i+1,j+1,k] + 5*p1[i,j,k]*u1[i,j+1,k]*u2[i,j+1,k+1] - 10*p1[i,j,k]*u1[i,j,k+1]*u2[i+1,j,k+1] + 5*p1[i,j,k]*u1[i,j,k+1]*u2[i,j+1,k+1] - 10*p2[i,j,k]*u1[i+1,j,k]*u2[i+1,j+1,k] + 5*p2[i,j,k]*u1[i+1,j,k]*u2[i+1,j,k+1] - 5*p2[i,j,k]*u1[i,j+1,k]*u2[i+1,j+1,k] - 5*p2[i,j,k]*u1[i,j+1,k]*u2[i,j+1,k+1] + 5*p2[i,j,k]*u1[i,j,k+1]*u2[i+1,j,k+1] - 10*p2[i,j,k]*u1[i,j,k+1]*u2[i,j+1,k+1] + 5*p3[i,j,k]*u1[i+1,j,k]*u2[i+1,j+1,k] + 10*p3[i,j,k]*u1[i+1,j,k]*u2[i+1,j,k+1] + 5*p3[i,j,k]*u1[i,j+1,k]*u2[i+1,j+1,k] - 10*p3[i,j,k]*u1[i,j+1,k]*u2[i,j+1,k+1] + 5*p3[i,j,k]*u1[i,j,k+1]*u2[i+1,j,k+1] - 5*p3[i,j,k]*u1[i,j,k+1]*u2[i,j+1,k+1] + 20*p1[i,j,k]*u2[i,j+1,k]*u1[i+1,j+1,k+1] + 20*p1[i,j,k]*u2[i,j,k+1]*u1[i+1,j+1,k+1] + 20*p2[i,j,k]*u2[i+1,j,k]*u1[i+1,j+1,k+1] + 20*p2[i,j,k]*u2[i,j,k+1]*u1[i+1,j+1,k+1] + 10*p3[i,j,k]*u2[i+1,j,k]*u1[i+1,j+1,k+1] + 20*p3[i,j,k]*u2[i,j+1,k]*u1[i+1,j+1,k+1] - 5*p1[i,j,k]*u2[i+1,j+1,k]*u1[i+1,j+1,k+1] - 5*p1[i,j,k]*u2[i+1,j,k+1]*u1[i+1,j+1,k+1] + 10*p1[i,j,k]*u2[i,j+1,k+1]*u1[i+1,j+1,k+1] - 5*p2[i,j,k]*u2[i+1,j+1,k]*u1[i+1,j+1,k+1] + 10*p2[i,j,k]*u2[i+1,j,k+1]*u1[i+1,j+1,k+1] - 5*p2[i,j,k]*u2[i,j+1,k+1]*u1[i+1,j+1,k+1] + 10*p3[i,j,k]*u2[i+1,j+1,k]*u1[i+1,j+1,k+1] + 5*p3[i,j,k]*u2[i+1,j,k+1]*u1[i+1,j+1,k+1] - 5*p3[i,j,k]*u2[i,j+1,k+1]*u1[i+1,j+1,k+1] - 20*p1[i,j,k]*u1[i,j+1,k]*u2[i+1,j+1,k+1] - 20*p1[i,j,k]*u1[i,j,k+1]*u2[i+1,j+1,k+1] - 20*p2[i,j,k]*u1[i+1,j,k]*u2[i+1,j+1,k+1] - 20*p2[i,j,k]*u1[i,j,k+1]*u2[i+1,j+1,k+1] - 10*p3[i,j,k]*u1[i+1,j,k]*u2[i+1,j+1,k+1] - 20*p3[i,j,k]*u1[i,j+1,k]*u2[i+1,j+1,k+1] + 5*p1[i,j,k]*u1[i+1,j+1,k]*u2[i+1,j+1,k+1] + 5*p1[i,j,k]*u1[i+1,j,k+1]*u2[i+1,j+1,k+1] - 10*p1[i,j,k]*u1[i,j+1,k+1]*u2[i+1,j+1,k+1] + 5*p2[i,j,k]*u1[i+1,j+1,k]*u2[i+1,j+1,k+1] - 10*p2[i,j,k]*u1[i+1,j,k+1]*u2[i+1,j+1,k+1] + 5*p2[i,j,k]*u1[i,j+1,k+1]*u2[i+1,j+1,k+1] - 10*p3[i,j,k]*u1[i+1,j+1,k]*u2[i+1,j+1,k+1] - 5*p3[i,j,k]*u1[i+1,j,k+1]*u2[i+1,j+1,k+1] + 5*p3[i,j,k]*u1[i,j+1,k+1]*u2[i+1,j+1,k+1]) + h^3 * (p1[i,j,k]^2*u1[i,j,k]*u1[i+1,j,k] + p1[i,j,k]^2*u1[i,j,k]*u1[i,j+1,k] + p1[i,j,k]^2*u1[i,j,k]*u1[i,j,k+1] + p2[i,j,k]^2*u1[i,j,k]*u1[i+1,j,k] + p2[i,j,k]^2*u1[i,j,k]*u1[i,j+1,k] + p2[i,j,k]^2*u1[i,j,k]*u1[i,j,k+1] + p3[i,j,k]^2*u1[i,j,k]*u1[i+1,j,k] + p3[i,j,k]^2*u1[i,j,k]*u1[i,j+1,k] + p3[i,j,k]^2*u1[i,j,k]*u1[i,j,k+1] + 4*p1[i,j,k]^2*u1[i+1,j,k]*u1[i,j+1,k] + 4*p1[i,j,k]^2*u1[i+1,j,k]*u1[i,j,k+1] + 4*p1[i,j,k]^2*u1[i,j+1,k]*u1[i,j,k+1] + p1[i,j,k]^2*u2[i,j,k]*u2[i+1,j,k] + p1[i,j,k]^2*u2[i,j,k]*u2[i,j+1,k] + p1[i,j,k]^2*u2[i,j,k]*u2[i,j,k+1] + 4*p2[i,j,k]^2*u1[i+1,j,k]*u1[i,j+1,k] + 4*p2[i,j,k]^2*u1[i+1,j,k]*u1[i,j,k+1] + 4*p2[i,j,k]^2*u1[i,j+1,k]*u1[i,j,k+1] + p2[i,j,k]^2*u2[i,j,k]*u2[i+1,j,k] + p2[i,j,k]^2*u2[i,j,k]*u2[i,j+1,k] + p2[i,j,k]^2*u2[i,j,k]*u2[i,j,k+1] + 4*p3[i,j,k]^2*u1[i+1,j,k]*u1[i,j+1,k] + 4*p3[i,j,k]^2*u1[i+1,j,k]*u1[i,j,k+1] + 4*p3[i,j,k]^2*u1[i,j+1,k]*u1[i,j,k+1] + p3[i,j,k]^2*u2[i,j,k]*u2[i+1,j,k] + p3[i,j,k]^2*u2[i,j,k]*u2[i,j+1,k] + p3[i,j,k]^2*u2[i,j,k]*u2[i,j,k+1] + 4*p1[i,j,k]^2*u2[i+1,j,k]*u2[i,j+1,k] + p3[i,j,k]^2*u2[i,j+1,k+1]*u2[i+1,j+1,k+1]))/(60) )
, i=1:N, j=1:N, k=1:N})

solve(m)

Running this in Julia results in a >4GB memory usage, which I suppose is because of the long objective function. Am I doing something wrong or is it common? In the second case is there a clever strategy to reduce this high memory usage?
Indeed my real objective function is 10 times longer than the one above, and so I run out of RAM very quickly even for small values of N…


#2

I believe not only JuMP, but most Julia packages are currently eating all the memory possible. My package is allocating GB of memory in Julia v0.5 too. I tried to optimize it, but without success.


#3

This is not an answer to your problem but as a general tips. Try to avoid having code of the type a+b+c+d+e.... Instead break it up into (a+b+c) + (d+e+f) + ....

As an example:

julia> f(b) = b+b+b+b+b+b+b;

julia> @code_llvm f(b)

define i64 @julia_f_71732(i64) #0 {
top:
  %1 = mul i64 %0, 7
  ret i64 %1
}

Too many additions:

julia> g(b) = b+b+b+b+b+b+b+b;

julia> @code_llvm g(b)

define i64 @julia_g_71737(i64) #0 {
top:
  %ptls_i8 = call i8* asm "movq %fs:0, $0;\0Aaddq $$-2672, $0", "=r,~{dirflag},~{fpsr},~{flags}"() #2
  %ptls = bitcast i8* %ptls_i8 to %jl_value_t***
  %1 = alloca [9 x %jl_value_t*], align 8
  %.sub = getelementptr inbounds [9 x %jl_value_t*], [9 x %jl_value_t*]* %1, i64 0, i64 0
  %2 = getelementptr [9 x %jl_value_t*], [9 x %jl_value_t*]* %1, i64 0, i64 2
  %3 = bitcast %jl_value_t** %2 to i8*
  call void @llvm.memset.p0i8.i32(i8* %3, i8 0, i32 56, i32 8, i1 false)
  %4 = bitcast [9 x %jl_value_t*]* %1 to i64*
  store i64 14, i64* %4, align 8
  %5 = getelementptr [9 x %jl_value_t*], [9 x %jl_value_t*]* %1, i64 0, i64 1
  %6 = bitcast i8* %ptls_i8 to i64*
  %7 = load i64, i64* %6, align 8
  %8 = bitcast %jl_value_t** %5 to i64*
  store i64 %7, i64* %8, align 8
  store %jl_value_t** %.sub, %jl_value_t*** %ptls, align 8
  %9 = getelementptr [9 x %jl_value_t*], [9 x %jl_value_t*]* %1, i64 0, i64 8
  %10 = getelementptr [9 x %jl_value_t*], [9 x %jl_value_t*]* %1, i64 0, i64 7
  %11 = getelementptr [9 x %jl_value_t*], [9 x %jl_value_t*]* %1, i64 0, i64 6
  %12 = getelementptr [9 x %jl_value_t*], [9 x %jl_value_t*]* %1, i64 0, i64 5
  %13 = getelementptr [9 x %jl_value_t*], [9 x %jl_value_t*]* %1, i64 0, i64 4
  %14 = getelementptr [9 x %jl_value_t*], [9 x %jl_value_t*]* %1, i64 0, i64 3
  %15 = mul i64 %0, 3
  store %jl_value_t* inttoptr (i64 140569349223256 to %jl_value_t*), %jl_value_t** %2, align 8
  %16 = call %jl_value_t* @jl_box_int64(i64 signext %15)
  store %jl_value_t* %16, %jl_value_t** %14, align 8
  %17 = call %jl_value_t* @jl_box_int64(i64 signext %0)
  store %jl_value_t* %17, %jl_value_t** %13, align 8
  %18 = call %jl_value_t* @jl_box_int64(i64 signext %0)
  store %jl_value_t* %18, %jl_value_t** %12, align 8
  %19 = call %jl_value_t* @jl_box_int64(i64 signext %0)
  store %jl_value_t* %19, %jl_value_t** %11, align 8
  %20 = call %jl_value_t* @jl_box_int64(i64 signext %0)
  store %jl_value_t* %20, %jl_value_t** %10, align 8
  %21 = call %jl_value_t* @jl_box_int64(i64 signext %0)
  store %jl_value_t* %21, %jl_value_t** %9, align 8
  %22 = call %jl_value_t* @julia_afoldl_71392(%jl_value_t* inttoptr (i64 140569394913224 to %jl_value_t*), %jl_value_t** %2, i32 7)
  %23 = bitcast %jl_value_t* %22 to i64*
  %24 = load i64, i64* %23, align 16
  %25 = load i64, i64* %8, align 8
  store i64 %25, i64* %6, align 8
  ret i64 %24
}

Break up with parenthesis:

julia> g2(b) = (b+b+b+b)+(b+b+b+b);

julia> @code_llvm g2(b)

define i64 @julia_g2_71744(i64) #0 {
top:
  %1 = shl i64 %0, 3
  ret i64 %1
}

#4

I’m not sure that @kristoffer.carlsson’s suggestion is directly relevant since JuMP doesn’t compile expressions (but worth a try in any case).

@mauro, if you could do some memory profiling of JuMP/ReverseDiffSparse and identify the culprits, that would help us understand the issue here.


#5

@mlubin Ah, sorry for the noise then!


#6

Hi all,

I tried to do some memory profiling as suggested, but the only thing I was able to obtain was an almost empty file with no useful informations. I then tried to avoid that long expression in the objective, and after some tries I came up with the following simpler example

using JuMP

nT = 3000
Q = sprand(nT,nT,0.1)
Q = (Q+Q')/2 + nT*speye(nT)
L = find(Q)
I, J = ind2sub(size(Q),L)

m = Model()
@variable(m, x[1:nT])
@NLobjective(m, Min, sum( Q[L[k]]*x[I[k]]*x[J[k]] for k=1:length(L)))

solve(m)

You can see here the same surprisingly high memory requirements as in my first example (2.8Gb on my machine), but now the objective is really simple (the quadratic form associated with the matrix Q). Is there a less consuming way to define the problem?


#7

Hi,

Just wanted to say that I am also encountering these issues. I generate some data (50,000 observations), which takes approximately 72 MB, but when I optimize in JuMP, the system uses 5.2 GB.

I’m using Ipopt with @NLobjective, just like @mauro. I’m doing maximum likelihood estimation of a fairly straightforward model (multinomial logistic regression). Happy to post code if desired, but it’s a bit longer than @mauro’s.

Tyler


#8

@tyleransom, one thing that I’ve found is useful for regression models is to pick out the linear subexpressions containing the data into separate variables and linear equality constraints, e.g., instead of min ||Ax-b||^2 write min ||y||^2 s.t. y == Ax-b. Same idea works for logistic regression. This ends up giving you a much sparser Hessian matrix and is less taxing on JuMP’s AD.


#9

Thanks, @miles.lubin. For clarification, I’ll see if I can implement your suggestion for the file mle.jl on JuMP’s example page, but modified to allow for parameters ß (so now μ = y-X*ß).

How I would normally do this is:

...
@variable(m, ß[i=1:size(X,2)], start = startval[i])
@variable(m, σ >= 0.0        , start = 1.0        )

@NLobjective(m, Max, (n/2)*log(1/(2*π*σ^2))-sum((y[i]-sum(X[i,k]*ß[k] for k=1:size(X,2)))^2 for i=1:size(X,1))/(2σ^2))
...

But what you are suggesting is the following?

...
@variable(m, ß[i=1:size(X,2)], start = startval[i])
@variable(m, σ >= 0.0        , start = 1.0        )

@constraint(m, dataXparms[i=1:size(X,1)]==y[i]-sum(X[i,k]*ß[k] for k=1:size(X,2)))

@NLobjective(m, Max, (n/2)*log(1/(2*π*σ^2))-sum(dataXparms[i])^2 for i=1:size(X,1))/(2σ^2))
...

Thanks,
Tyler


#10

Yes, but the syntax is off. You need to declare the variables dataXparams separately from the linear constraints.


#11

Thanks!


#12

@miles.lubin I’m probably missing something, but I’m not seeing how your suggestion to re-express the problem as min ||y||^2 s.t. y == Ax-b is helpful for memory management. Is it because I’m not matrix multiplying the constraints? (I thought this wasn’t allowed)

If I run something akin to the file I previously shared, JuMP takes forever if n is sufficiently large, because there are n constraints (dataXparms[i=1:size(X,1)]==...) on the objective function, where n=size(X,1). Moreover, JuMP appears to use more memory in this instance (presumably because of the large number of constraints). I’m not sure if the regression applications you’ve used have relatively small n.

For n=300: JuMP uses 70.3 MB and takes 3.6s when the additional n constraints are used.
For n=3000: JuMP’s memory usage approached 2 GB and was still running after 300s when I gave up.

If I put y[i]-sum(X[i,k]*ß[k] for k=1:size(X,2)) directly into the objective function, then the usage is:
n=300: 8 MB, time is <1s
n=3000: 70 MB, time is 1.2s
n=30000: 654 MB, time is 10.8s

Related to OP, the JuMP memory usage is ~30x-40x larger than the data itself. Adding the n constraints doesn’t seem to solve the problem. It appears to be directly proportional to n, since a 10x scale-up of n roughly increases JuMP’s usage by a factor of 10.

I’m not sure that this memory usage is a problem per se, but I don’t recall encountering this in previous version of JuMP.

Big shout out to you and Iain and Joey et al. for developing this—it’s amazing!


#13

No, it’s because the matrix of second-order derivatives of the nonlinear part becomes diagonal instead of dense, and the expression graph of the nonlinear part becomes much simpler. Whether this ends up helping or hurting depends on the specific optimization model and the solver used, so I guess it wasn’t the right answer for your case.

It would be useful to break this usage down into the memory used by JuMP’s AD code, by Julia’s GC, and by the nonlinear solver itself. You can manually request derivatives from JuMP and track the memory usage. If the second-order derivative computations are responsible, you could use a solver/method that doesn’t request them.

Nothing substantial has changed in JuMP’s AD code over the past year. There are certainly optimizations that have not been tried and could be made for data-heavy models like these. There will always be some cost to pay for JuMP’s generality in derivative computations over specialized methods for logistic regression, for example, but I agree that 30x-40x memory usage seems a bit excessive (to be fair, you should compare with the memory used by some other second-order logistic regression solvers and not just the data itself).

I have no plans to spend time pursuing improvements here, but it would be a fun project for whoever’s motivated to do so.


#14

Thanks, @miles.lubin. I’ll post another update here with the profiling results, but I won’t bother you anymore. Really appreciate the help!


#15

For posterity’s sake, I profiled memory allocation throughout my JuMP call. Here is the corresponding gist.

Summary of findings:

  • Data generation requires 15MB of memory allocation.
  • JuMP allocates ~600 MB at the solve() call
  • Another ~600 MB are allocated at the MathProgBase.initialize() call.
  • These two lines account for almost the entire 1.22GB that are allocated for the entire optimization.
  • When breaking down the MathProgBase.initialize() call, it appears that the second-order derivative computation is what is driving this, as it accounts for 79% of the allocated memory.

As Miles mentioned above, this doesn’t seem to be unintended behavior, but looks more like an issue with the second-order derivatives being too dense. This may also be what is driving @mauro’s problem, too.


Julia vs R vs Python: simple MLE problem
#16

Hi there,

I too have come across the memory issue, but in my case I use 25000 variables with julia v0.6 and the latest JuMP. This balloons to 50GB memory use when I execute the Ipopt solver. I ran whos() and it reports 364MB for the model, but also for each of the three arrays. I’m wondering whether a large number of deep copies of the model are being generated instead of pointers, since Ipopt reports only 23e6 non-zeros in the Jacobian of the equality constraints and 23e3 non-zeros in the Lagrangian Hessian. Here’s the relevant code:

Dim= 1000
m=Model()
@variable(m, v[1:Dim])
@variable(m, meps[1:Dim])
@variable(m, mzeta[1:prod(size(IWF[:,:,1:Dim]))])
@constraint(m, meps .== E[1:Dim]-Kernel*v)
DK=reshape(DerivativeKernel,(21*Dim,Dim))
@constraint(m,mzeta .== reshape(IWF[:,:,1:Dim],21*Dim)-DK*v)
@variable(m,Kv[1:Dim])
@constraint(m,Kv .== Kernel*v)
@objective(m, Min, 0.5*sum(meps.^2)*1e-6 + 0.5*sum(mzeta.^2) + 0.0005*sum(v.*Kv))
setsolver(m,IpoptSolver())
solve(m)

Kernel and DK are appropriately sized matrices.

Inserting

@variable(m,Kv[1:Dim])
@constraint(m,Kv .== Kernel*v)

and replacing v.*(Kernel*v) with v.*Kv actually decreased Ipopt efficiency, but the memory problem remains the same.


#17

This is super late, but thanks for tacking on to this discourse! I’m not sure what to do about these issues because I love JuMP but this memory overhead is cumbersome.