Add nested optimization problem in JuMP

I want to include an inner loop optimization (a fixed point iteration algorithm) inside my current optimization problem. I cannot figure out how to make it work… Any help/guidance/advice is truly appreciated!

The current optimization code is ipopt_jump (thanks to the help of @odow in this post). It tries to find theta[1:K, 1:L] and delta[1:H] to maximize the objective function. The dimension of theta is relatively small, K=11, L=5, but the dimension of delta H is very big, H > 2000. Optimizing over >2000 parameters takes forever to run.

There is a trick that the value of delta can be calculated for any theta using a fixed point algorithm. Then I only need to maximize over K*L = 55 parameters instead of >2055 parameters.

delta can be expressed as a function of theta: \delta' = \delta + \ln s_h - \ln \hat{s}_h(\delta,\theta) . This will keep iterating until \delta' = \delta. \ln s_h is observed from the data, and \ln \hat{s}_h is \hat s_h(\delta,\theta) = \frac{1}{N} \sum_{i=1}^N \frac{\exp(\delta_h + X\theta Z)}{\sum_h \exp(\delta_h + X\theta Z)} .

I want to add an inner loop to solve for \delta(\theta) given \theta, then I only need to optimize over \theta. The outer loop is to maximize the objective function in ipopt_jump. I tried to incorporate the fixed point algorithm in ipopt_fp_jump, but it doesn’t work because of type incompatibility…

A data sample can be found here.

function ipopt_jump(data, iter=10000, tol=1e-3)
    X, Z, choice = data
    N, L = size(Z)
    H, K = size(X)
    model = Model(Ipopt.Optimizer)
    set_silent(model)
    set_attributes(model, "tol" => tol, "max_iter" => iter)
    @variable(model, theta[1:K, 1:L])
    @variable(model, delta[1:H])
    @expression(model, v, delta .+ (X * theta * Z'))
    @NLexpression(model, evsum[i = 1:N], sum(exp(v[h, i]) for h = 1:H) + 1)
    choice_h = sum(choice; dims = 1)
    @NLobjective(
        model,
        Max,
        sum(choice[h, i] * v[h, i] for h = 1:H, i = 1:N) - 
        sum(choice_h[i] * log(evsum[i]) for i = 1:N)
    )
    optimize!(model)
    return objective_value(model), value.(theta), value.(delta)
end

function ipopt_fp_jump(data, iter=10000, tol=1e-3)
    X, Z, choice, log_obs_s_h = data
    N, L = size(Z)
    H, K = size(X)
    model = Model(Ipopt.Optimizer)
    set_silent(model)
    set_attributes(model, "tol" => tol, "max_iter" => iter)

    @variable(model, theta[1:K, 1:L])


    # does not work..
    function calc_pred_s_h(θ, δ)
        @expression(model, v, δ .+ (X * θ * Z'))
        @NLexpression(model, evsum[i = 1:N], sum(exp(v[h, i]) for h = 1:H) + 1)
        @NLexpression(model, s_ih[h = 1:H, i = 1:N], exp(v[h, i]) / evsum[i])
        @NLexpression(model, s_h[h = 1:H], sum(s_ih[h, i] for i = 1:N)/N)  
        @NLexpression(model, ln_s_h[h = 1:H], log(s_h[h]))
        return ln_s_h
    end

    # same function as above: does not work..
    # function calc_s_h(θ, δ)
    #     ev = exp.(δ .+ (X * θ * Z'))
    #     evsum = sum(ev, dims = 1)
    #     s_ih = ev ./ evsum
    #     s_h = mean(s_ih, dims = 2)
    #     return log.(s_h)
    # end
    
    function fixed_point_algorithm(θ)
        old_delta = rand(H)
        normdiff = Inf

        while normdiff > tol
            new_delta = old_delta .+ log_obs_s_h .- calc_pred_s_h(θ, old_delta)
            normdiff = norm(new_delta - old_delta)
            old_delta = new_delta
        end
        
        return new_delta
    end

    delta = fixed_point_algorithm(theta)

    @expression(model, v, delta .+ (X * theta * Z'))
    @NLexpression(model, evsum[i = 1:N], sum(exp(v[h, i]) for h = 1:H) + 1)
    choice_h = sum(choice; dims = 1)

    @NLobjective(
        model,
        Max,
        sum(choice[h, i] * v[h, i] for h = 1:H, i = 1:N) - 
        sum(choice_h[i] * log(evsum[i]) for i = 1:N)
    )
    optimize!(model)
    return objective_value(model), value.(theta), value.(delta)
end

Why does it currently fail?

You cannot do what you’re trying to do in ipopt_fp_jump because you’re passing in theta::Vector{VariableRef}, which tries to compute the fixed point based on the JuMP variable object, not on the value of the variable in a solution.

The usual work-around would be to use a user-defined function, Nonlinear Modeling · JuMP, but these require scalar outputs.

Nothing immediately comes to mind for a good way to solve this problem with JuMP. Others might have a suggestion.

I’ll note that this problem is unconstrained, so you might have better luck trying something like Optim.jl. if you write things as a pure Julia function given theta (including the fixed point calculation), it might just work.

1 Like

Shameless plug: my package ImplicitDifferentiation.jl is very well-suited to computing gradients of fixed points. Combining it with a first order algorithm from Optim.jl or Optimization.jl for the outer loop might work

1 Like

Hi gdalle. It fails because ln_s_h is a general expression that cannot be directly used inside the fixed point function.