Performance issue when using kronecker product with ForwardDiff.Dual type

Hello!

I’m new to Julia, and I’m working on a constrained optimization problem using JuMP.

I’m having a performance issue when calculating a kronecker product in a function that receives the type ForwardDiff.Dual. In particular, I think I might have a type-stability issue since the kronecker product seems to use a large number of allocations, and amount of memory. However, when using @code_warntype either on the optimization() function or on the functions inside, I wasn’t able to find the problem.

I would greatly appreciate if someone could take a look to see if I’m doing something wrong or if there are tweaks that could improve performance. Below you can see a simple example of my code:

using Pkg
Pkg.add("DataFrames")
Pkg.add("StatFiles")
Pkg.add("Queryverse")
Pkg.add("LinearAlgebra")
Pkg.add("CSV")
Pkg.add("Distributions")
Pkg.add("JuMP")
Pkg.add("Ipopt")
Pkg.add("ForwardDiff")
using DataFrames, StatFiles, Queryverse, LinearAlgebra, CSV, Distributions, JuMP, Ipopt, ForwardDiff

####################################
############   Data   ##############
####################################

const s_t = rand(Float64, (20,8000))
const d_t = rand(Float64, (20,8000))
const d_t_lag = rand(Float64, (20,8000))
const i_t = rand(Float64, (120,8000))

const N = size(s_t, 1)
const K = size(s_t, 2)
const L = floor(Int, size(i_t, 1)/N)

####################################
########## Optimization ############
####################################

# We define a simple macro to time the process
macro time_calc_e(expr)
    quote
        print("calc_e:")
        @time $(esc(expr))
    end
end

macro time_kronecker(expr)
    quote
        print("kronecker:")
        @time $(esc(expr))
    end
end

# Set up JuMP problem
# We use the solver 'Ipopt'

function optimization()

    # We calculate the error evaluating the parameters in the calc_e function
    function calc_e(β::T, γ::T, W::T...) where {T <: Real}
        M = Matrix{T}(reshape([i for i in W], N, N))
        e_t = zeros(T, (N, K))
        # Note: the for loop to calculate e is faster than the direct matrix multiplication.
        @time_calc_e for t in 1:K
            e_t[:,t] = s_t[:,t] - β*d_t[:,t] - γ*M*d_t_lag[:,t]
        end
        g = zeros(T, (L*(N^2), 1))
        @time_kronecker for t = 1:K
            g .+= @views kron(i_t[:,t], e_t[:,t])
        end
        G = g'*I*g
        return G[1]
    end

    # We calculate the penalization term ∑|W_i,j|
    function penalization_1(W::T...) where {T <: Real}
        pen1 = Vector{T}(undef, 1)
        pen1 = sum(abs.([i for i in W]))
        return pen1[1]
    end

    # We calculate the penalization term ∑|W_i,j|^2
    function penalization_2(W::T...) where {T <: Real}
        pen2 = Vector{T}(undef, 1)
        pen2 = sum((abs.([i for i in W])).^2)
        return pen2[1]
    end

    # We compute the value of the (penalized) GMM objective function
    function gmm_pen(G::T, pen_1::T, pen_2::T) where {T <: Real}
        p1 = 0.5
        p2 = 0.5
        G_pen = G + p1*pen_1 + p2*pen_2
        return G_pen
    end

    # We create the start values for the matrix of interactions
    W0 = rand(N,N)

    # We create a model by passing the optimizer 'Ipopt' to the Model function
    m = Model(Ipopt.Optimizer; add_bridges = false)

    # We set some of the attributes of the optimizer 'Ipopt'
    set_optimizer_attribute(m, "max_iter", 500)

    # We register the function that returns the matrix of errors
    JuMP.register(m, :calc_e, 402, calc_e; autodiff = true)

    # We register the function that returns the first penalization term
    JuMP.register(m, :penalization_1, 400, penalization_1; autodiff = true)

    # We register the function that returns the first penalization term
    JuMP.register(m, :penalization_2, 400, penalization_2; autodiff = true)

    # We register the function that returns the value of the objective function
    JuMP.register(m, :gmm_pen, 3, gmm_pen; autodiff = true)

    # We create the variables (the objects that change in the optimization)
    @variable(m, β, start = -1.0)
    @variable(m, γ, start = -1.0)
    @variable(m, W[i = 1:N, j = 1:N], start = W0[i,j])

    # We create constraints on the variables
    @constraint(m, [i=1:N], W[i,i] == 0.0) # The diagonal of W is zero.
    @constraint(m, [i=1:N], sum(W[i,:]) == 1.0) # The row-sum of W is one.

    # We want to minimize the objective function value
    @NLobjective(m, Min, gmm_pen(calc_e(β, γ, W...), penalization_1(W...), penalization_2(W...)))

    # We can call print to display the model
    print(m)

    # To find the solution of the (penalized) GMM we use optimize!
    println("Solving...")
    optimize!(m)

    return m
end

model = optimization()

Thank you very much in advance!

1 Like

Hi there! Thanks for making a great reproducible example to start with. You can make it a little better by excluding the packages that aren’t relevant. For example, I didn’t need to download and install QueryVerse to run this example.

Two high level comments:

  • While JuMP supports user-defined functions, these should be used sparingly. Where possible, write out your expressions algebraically. This lets JuMP “see” the expression and it can perform more analysis (like computing second derivatives).
  • By default, user-defined functions use ForwardDiff to compute derivatives. Forward-mode automatic differentiation scales with the length of the input, so a function with 400 inputs will take ~400 times longer to evaluate the gradient than to evaluate the function itself.

So you can see, the combination of a lot of user-defined functions with large inputs is what gets you into trouble. There are also a bunch of allocations, like needing to allocate the Vector{T}(undef, 1) that don’t help.

I’d write your code a completely different way, as follows:

function optimization()
    model = Model(Ipopt.Optimizer)
    @variables(model, begin
        β, (start = -1.0)
        γ, (start = -1.0)
        W[i = 1:N, j = 1:N], (start = rand())
        penalty_1
        penalty_2
        g[i=1:size(i_t, 1), j=1:N]
    end)
    @expressions(model, begin
        e_t, s_t .- β * d_t .- γ * W * d_t_lag
    end)
    @constraints(model, begin
        [i=1:N], W[i, i] == 0.0
        [i=1:N], sum(W[i, :]) == 1.0
        [i=1:size(i_t, 1), j=1:N], g[i, j] == sum(i_t[i, t] * e_t[j, t] for t in 1:K)
    end)
    @NLconstraints(model, begin
        penalty_1 == sum(abs(w) for w in W)
        penalty_2 == sum(w^2 for w in W)
    end)
    @NLobjective(
        model, 
        Min, 
        sum(g[i, j]^2 for i in 1:size(i_t, 1), j in 1:N) +
        0.5 * penalty_1 + 0.5 * penalty_2,
    )
    optimize!(model)
    return model
end

model = optimization()

This runs, but it encounters an error. Unless I made an error somewhere, this is most likely because of the abs. Ipopt assumes functions are smooth and twice differentiable, so it often has numerical problems with abs.

Using the standard reformulation trick for abs of y_1 - y_2 == x, abs(x) := y_1 + y_2 where y_1, y_2 >= 0, I obtained:

function optimization()
    model = Model(Ipopt.Optimizer)
    @variables(model, begin
        β, (start = -1.0)
        γ, (start = -1.0)
        W[i = 1:N, j = 1:N], (start = rand())
        penalty_1
        penalty_2
        g[i=1:size(i_t, 1), j=1:N]
        W_plus[1:N, 1:N] >= 0
        W_neg[1:N, 1:N] >= 0
    end)
    @expressions(model, begin
        e_t, s_t .- β * d_t .- γ * W * d_t_lag
    end)
    @constraints(model, begin
        [i=1:N], W[i, i] == 0.0
        [i=1:N], sum(W[i, :]) == 1.0
        [i=1:size(i_t, 1), j=1:N], g[i, j] == sum(i_t[i, t] * e_t[j, t] for t in 1:K)
        W_plus .- W_neg .== W
        penalty_1 == sum(W_plus) + sum(W_neg)
        penalty_2 == sum(w^2 for w in W)
    end)
    @NLobjective(
        model, 
        Min, 
        sum(g[i, j]^2 for i in 1:size(i_t, 1), j in 1:N) +
        0.0 * penalty_1 + 1 * penalty_2,
    )
    optimize!(model)
    return model
end

model = optimization()

This solves in 10 iterations and 0.5 seconds, but I’ll leave it to you to check the solution is reasonable and that I didn’t make a typo somewhere.

1 Like

I really appreciate your time to elaborate on such a complete answer. Thank you! I need to check a couple of details tomorrow about dimensions and the abs trick, but from a broader perspective, your explanation about how JuMP performs better and your approach to solving my problem makes total sense and looks good. Super helpful!

1 Like

The abs trick is

# Can't use `abs(x)` because it is nonlinear, and non-smooth
model = Model()
@variable(model, x)
@objective(model, Min, abs(x))

# Option 1: y := abs(x); y >= x; y >= -x
model = Model()
@variable(model, x)
@variable(model, y)
@constraint(model, y >= x)
@constraint(model, y >= -x)
@objective(model, Min, y)

# Option 2: y1 + y2 := abs(x); y1 - y2 == x; y1, y2 >= 0
model = Model()
@variable(model, x)
@variable(model, y[1:2])
@constraint(model, y[1] - y[2] == x)
@objective(model, Min, y[1] + y[2])

Note that this works only if you are minimizing “down” into the convex view of abs(x). It won’t work if you are maximizing “up” so that the shape is non-convex.

There’s also a third option, which is to let JuMP do the reformulation:

# Option 3: use MOI.NormOneCone
model = Model()
@variable(model, x[1:2, 1:2])
@variable(model, t)
@constraint(model, [t; vec(x)] in MOI.NormOneCone(length(x)+1))
@objective(model, Min, t)
1 Like

Hello. Thank you again for your helpful response! I checked everything in detail and works great.

May I ask one last related question to this problem? The dimension of the actual input data is larger than what I used for my example above. Now, I’ve been struggling with solving the entire problem since it takes a long time to compile and iterate. Below you can find the code I’ve been trying to run:

using Pkg
Pkg.add("DataFrames")
Pkg.add("StatFiles")
Pkg.add("LinearAlgebra")
Pkg.add("CSV")
Pkg.add("Distributions")
Pkg.add("JuMP")
Pkg.add("Ipopt")
Pkg.add("ForwardDiff")
using DataFrames, StatFiles, LinearAlgebra, CSV, Distributions, JuMP, Ipopt, ForwardDiff

####################################
############   Data   ##############
####################################

rows = 1500 # This is the dimension that brings complications.
cols = 20000

const s_t = rand(Float64, (rows,cols))
const d_t = rand(Float64, (rows,cols))
const d_t_lag = rand(Float64, (rows,cols))
const i_t = rand(Float64, (rows*6,cols))

####################################
########### Parameters #############
####################################

const N = size(s_t, 1)
const K = size(s_t, 2)
const L = floor(Int, size(i_t, 1)/N)

####################################
########## Optimization ############
####################################

function optimization()
    model = Model(Ipopt.Optimizer)
    @variables(model, begin
        β, (start = -1.0)
        γ, (start = -1.0)
        W[i = 1:N, j = 1:N], (start = rand())
        penalty_1
        penalty_2
        g[i=1:size(i_t, 1), j=1:N]
        W_plus[1:N, 1:N] >= 0
        W_neg[1:N, 1:N] >= 0
    end)
    @expressions(model, begin
        e_t, s_t .- β * d_t .- γ * W * d_t_lag
    end)
    @constraints(model, begin
        [i=1:N], W[i, i] == 0.0
        [i=1:N], sum(W[i, :]) == 1.0
        [i=1:size(i_t, 1), j=1:N], g[i, j] == sum(i_t[i, t] * e_t[j, t] for t in 1:K)
        W_plus .- W_neg .== W
        penalty_1 == sum(W_plus) + sum(W_neg)
        penalty_2 == sum(w^2 for w in W)
    end)
    @NLobjective(
        model, 
        Min, 
        sum(g[i, j]^2 for i in 1:size(i_t, 1), j in 1:N) +
        0.5 * penalty_1 + 0.5 * penalty_2,
    )
    optimize!(model)
    return model
end

model = optimization()

solution = all_variables(model)
solution_df = DataFrame(name = name.(solution), value = value.(solution))

Is there a way of making this code run faster? I guess that when the number of rows grows, the problem becomes much heavier because of the auxiliary variables used to handle the abs value of W, but I’m not sure how to circumvent this issue.

Thank you again for any help or suggestion you may have.

With rows = 1_500 and cols = 20_000, I get:

julia> model
A JuMP Model
Feasibility problem with:
Variables: 20250004

2e7 variables is a very large problem. Do you really need the L1 norm?

Why have γ * W? Could you not go instead [i=1:N], sum(W[i, :]) == γ? The factor of γ will show up in your penalties, but they seem somewhat arbitrary.

Hey! Thanks for your response. In this problem, the L1 norm is essential since it plays a role for the properties of the estimator.

On the other hand, I guess I could get rid of γ, but I don’t see clearly how this substantially reduces the scale of the optimization problem. I’m sorry if I’m missing something in your explanation.

Removing gamma let’s you write it as a convex quadratic program. The previous formulation ended up as a non-convex program with quartic terms between gamma^2 * W * W.

I didn’t test, but something like:

function optimization(;rows, cols)
    s_t = rand(Float64, (rows, cols))
    d_t = rand(Float64, (rows, cols))
    d_t_lag = rand(Float64, (rows, cols))
    i_t = rand(Float64, (6rows, cols))
    N = size(s_t, 1)
    model = Model()
    @variables(model, begin
        β, (start = -1.0)
        W_plus[1:N, 1:N] >= 0, (start = rand())
        W_neg[1:N, 1:N] >= 0
    end)
    @expressions(model, begin
        W, W_plus .- W_neg
        g, i_t * (s_t .- β * d_t .- W * d_t_lag)'
    end)
    for i in 1:N
        fix(W_plus[i, i], 0.0; force = true)
        fix(W_neg[i, i], 0.0; force = true)
    end
    @constraint(model, [i=1:N], sum(W[i, :]) == 1.0)
    @objective(
        model, 
        Min, 
        sum(gi^2 for gi in g) +
        0.5 * (sum(W_plus) + sum(W_neg)) +
        0.5 * sum(w^2 for w in W),
    )
    return model
end

You might also consider bypassing JuMP and writing an explicit interface to Ipopt: GitHub - jump-dev/Ipopt.jl: Julia interface to the Ipopt nonlinear solver.

JuMP isn’t necessarily the best tool for the job with such large problems.

1 Like

Thank you for all your help! Your solution works fine with a small number of rows, but Julia crashes every time I try to increase the number of rows close ~1000. Do you think this is a JuMP limitation? Or will I likely have the same problem if I move to Matlab (which I would like to avoid) given the problem dimension? Thanks again. I’ve learned a lot from our interactions!

You’re likely running out of memory. 2e7 variables is a really big problem.

You probably need to go through the raw Ipopt API to minimize the amount of overhead. JuMP provides user-friendly syntax at a cost.

Here’s something to get you started. You’ll need to finish the bit "TODO: add the ...".

function optimization(;rows, cols)
    s_t = rand(Float64, (rows, cols))
    d_t = rand(Float64, (rows, cols))
    d_t_lag = rand(Float64, (rows, cols))
    i_t = rand(Float64, (6rows, cols))
    N = size(s_t, 1)
    function eval_f(x::Vector{Float64})
        W = zeros(N, N)
        for i in 1:N
            for j in 1:N        
                k_plus = i + (j - 1) * N
                k_neg = N^2 + i + (j - 1) * N
                W[i, j] = x[k_plus] - x[k_neg]
            end
        end
        β = x[end]
        g = i_t * (s_t .- β * d_t .- W * d_t_lag)'
        return sum(gi^2 for gi in g) + sum(0.5 * abs(w) + 0.5 * w^2 for w in W)
    end
    function eval_grad_f(x::Vector{Float64}, grad_f::Vector{Float64})
        grad_f .= 0.0
        for i in 1:N
            for j in 1:N
                k_plus = i + (j - 1) * N
                k_neg = N^2 + i + (j - 1) * N
                # + 0.5 * sum(W_plus)
                grad_f[k_plus] += 0.5
                # + 0.5 * sum(W_neg)
                grad_f[k_neg] += 0.5
                # + 0.5 * sum(W^2)
                # => +0.5 * W_plus^2 + 0.5 * W_neg^2 - W_plus * W_neg
                grad_f[k_plus] += x[k_plus] - x[k_neg]
                grad_f[k_neg] += x[k_neg] - x[k_plus]
            end
        end
        # TODO: add the grad component of d(g)/dW and d(g)/dβ
        return
    end
    function eval_g(x::Vector{Float64}, g::Vector{Float64})
        for i in 1:N
            g[i] = sum(x[i + (j - 1) * N] - x[N^2 + i + (j - 1) * N] for j in 1:N)
        end
        return
    end
    function eval_jac_g(
        ::Vector{Float64},
        rows::Vector{Cint},
        cols::Vector{Cint},
        values,
    )
        if values === nothing
            for i in 1:N, j in 1:N
                push!(rows, Cint(i))
                push!(cols, Cint(i + (j - 1) * N))
                push!(rows, Cint(i))
                push!(cols, Cint(N^2 + i + (j - 1) * N))
            end
        else
            nnz = 0
            for _ in 1:N, _ in 1:N
                values[nnz+1] = 1.0
                values[nnz+2] = -1.0
                nnz += 2
            end
        end
        return
    end
    x_upper = fill(Inf, 2 * N^2 + 1)
    for i in 1:N
        x_upper[i + (i - 1)*N] = 0.0
        x_upper[N^2 + i + (i - 1)*N] = 0.0
    end
    prob = Ipopt.CreateIpoptProblem(
        2 * N^2 + 1,                    # num_variables
        vcat(zeros(2 * N^2, -Inf)),     # x_lower
        x_upper,
        N,                              # num_constraints
        fill(1.0, N),                   # g_L
        fill(1.0, N),                   # g_U
        2 * N^2,                        # nnz in Jacobian
        0,                              # nnz in Hessian
        eval_f,
        eval_g,
        eval_grad_f,
        eval_jac_g,
        nothing,
    )
    stat = Ipopt.IpoptSolve(prob)
    return stat
end
1 Like