Scaling issues in a nonlinear program

Dear all, I’m trying to solve a non-linear constrained program. The program consists of minimizing the sum of the relative squared deviations between the data and their model counterpart. I’m doing it using IpOpt on JUMP. I have some data for total population and for the number of university workers. Naturally the difference in the order of magnitude between these variables is huge (ca. 1/1e7). So far, I scaled these variables inconsistently (so that instead of 1/1e7 it becomes 1/1e3) and managed to solve the program encountering no particular issues. However, I would like to solve the problem with consistent scales. I tried to rescale some of the constraints, but it seems like a gamble and didn’t conclude much. I tried to get rid of the biggest number of variables, but even this didn’t produce results. Do you have any recommendations for this? Anyone has encountered a similar problem? Hereunder you find the objective and the constraints.

The objective reads as :

#Objective
 @NLobjective(model, Min, sum(
    sum(((data_1[!, Symbol(moment)][t] - vars[Symbol(moment)][t+1]) / data_1[!, Symbol(moment)][t])^2
        for moment in current_moments
        if !ismissing(data_1[!, Symbol(moment)][t]))
    for t in 1:T1
    ))

The problem is subject to a series of linear and mostly non-linear constraints of the sort:

# Constraints
# Define the constraints for t = 2 to T2+1 as t = 0, T2+2 are subject to SS constraints
    # ---- linear block ----
    @constraints(model, begin
        [t = 2:(T2+1)], vars[:g][t] == ppi * vars[:P][t+1]  # Generation
        [t = 2:(T2+1)], vars[:T][t] == vars[:g][t-1] + vars[:g][t]  # Total population
        [t = 2:(T2+1)], vars[:wpop][t] == vars[:T][t] - vars[:M][t] - vars[:M][t-1]  # Working population
        [t = 2:(T2+1)], vars[:a_bar][t] == vars[:h_bar][t]  # Intergenerational human capital transmission
    end)

    # ---- nonlinear block ----
    @NLconstraints(model, begin
        [t = 2:(T2+1)], vars[:urb][t] == vars[:Lu][t] / vars[:d][t] + vars[:M][t]+ vars[:M][t-1]   # Urban population
        [t = 2:(T2+1)], vars[:rur][t] == vars[:Lr][t] / vars[:d][t]  # Rural population
        [t = 2:(T2+1)], vars[:r_urb][t] == vars[:urb][t] / vars[:T][t]  # Urbanization rate
        [t = 2:(T2+1)], vars[:a_hat][t] == (vars[:p][t+1] / (vars[:B][t+1] * vars[:h_bar][t]^(1 - vars[:theta][] - vars[:gamma][] / 2)))^(2 / (2 * vars[:theta][] + vars[:gamma][]))  # Indifferent individual threshold
        [t = 2:(T2+1)], vars[:a_tilda][t] == (vars[:a_hat][t] * vars[:a_bar][t])^(1 / 2)  # Peer effect
        [t = 2:(T2+1)], vars[:h_bar][t] == vars[:B][t] * (vars[:a_bar][t-1])^vars[:theta][] * (vars[:a_tilda][t-1])^vars[:gamma][] * (vars[:h_bar][t-1])^(1 - vars[:gamma][] - vars[:theta][])  # Knowledge frontier
        [t = 2:(T2+1)], vars[:h_min][t] == vars[:B][t] * (vars[:a_hat][t-1])^vars[:theta][] * (vars[:a_tilda][t-1])^vars[:gamma][] * (vars[:h_bar][t-1])^(1 - vars[:gamma][] - vars[:theta][])  # Minimum human capital at university
        [t = 2:(T2+1)], vars[:g_hbar][t] == vars[:h_bar][t] / vars[:h_bar][t-1]  # Growth knowledge frontier
        [t = 2:(T2+1)], vars[:g_hmin][t] == vars[:h_min][t]/vars[:h_min][t-1]  # Growth minimum human capital
        [t = 2:(T2+1)], vars[:m][t] == log(vars[:a_bar][t] / vars[:a_hat][t]) / log(vars[:a_bar][t] / a_lbar)  # Share of students over generational population
        
        [t = 2:(T2+1)], vars[:M][t] == vars[:g][t] * vars[:m][t]  # Number of students
        # in t = 1, m is realized neglecting beta (t=2). Therefore beta (t=2) is completely unidentified
        
        [t = 2:(T2+1)], vars[:comp][t] == vars[:a_hat][t] / vars[:h_bar][t]  # University composition
        [t = 2:(T2+1)], vars[:h_mean][t] == vars[:B][t] * (vars[:a_tilda][t-1])^vars[:gamma][] * (vars[:h_bar][t-1])^(1 - vars[:gamma][] - vars[:theta][]) * 1 / vars[:m][t-1] * 1 / log(vars[:a_bar][t-1] / a_lbar) * 1 / vars[:theta][] * (vars[:a_bar][t-1]^vars[:theta][] - vars[:a_hat][t-1]^vars[:theta][])  # Average human capital supplied
        [t = 2:(T2+1)], vars[:H][t] == vars[:M][t-1] * vars[:h_mean][t]  # Aggregate human capital 
        [t = 2:(T2+1)], vars[:Lr][t] + vars[:Lu][t] == ((1 - vars[:m][t]) * vars[:g][t] + (1 - vars[:m][t-1]) * vars[:g][t-1]) * vars[:d][t]  # Labour supply
        [t = 2:(T2+1)], vars[:q][t] * vars[:H][t]^vars[:alpha][t] * (1 - vars[:alpha][t]) * vars[:Lu][t]^(-vars[:alpha][t]) == vars[:A][t] * vars[:H][t]^vars[:eps][] * vars[:delta][] * vars[:Lr][t]^(vars[:delta][] - 1) * vars[:X][t]^(1 - vars[:delta][])  # Labour repartition
        
        [t = 2:(T2+1)], vars[:q][t] == (vars[:beta][] * vars[:Y_r][t]) / (vars[:Y_u][t] + vars[:g][t-1]*vars[:phi][])  # Relative price of urban goods
        
        [t = 2:(T2+1)], vars[:w][t] == vars[:q][t] * vars[:alpha][t] * vars[:H][t]^(vars[:alpha][t] - 1) * vars[:Lu][t]^(1 - vars[:alpha][t])  # Human capital wage
        [t = 2:(T2+1)], vars[:f][t] == vars[:A][t] * vars[:H][t]^vars[:eps][] * vars[:delta][] * vars[:Lr][t]^(vars[:delta][] - 1) * vars[:X][t]^(1 - vars[:delta][])  # Labor wage
        [t = 2:(T2+1)], vars[:f_r][t] == vars[:f][t] / vars[:f][p_n_year]  # Labor wage (base year = n_year)
        [t = 2:(T2+1)], vars[:r_r][t] == vars[:r][t] / vars[:r][p_n_year]  # Land rents (base year = n_year)
        [t = 2:(T2+1)], vars[:gf][t] == vars[:f][t] / vars[:f][t-1]  # Growth labor wage
        [t = 2:(T2+1)], vars[:r][t] == vars[:A][t] * vars[:H][t]^vars[:eps][] * vars[:Lr][t]^vars[:delta][] * (1 - vars[:delta][]) * vars[:X][t]^(-vars[:delta][])  # Land rents
        [t = 2:(T2+1)], vars[:p][t] == ((vars[:d][t-1] * vars[:f][t-1] + vars[:d][t] * vars[:f][t])*vars[:v][t]^(-1/(1+vars[:beta][])) +
                        (vars[:r][t]*vars[:X][t] / vars[:g][t-1] + vars[:phi][]*vars[:q][t])*(vars[:v][t]^(-1/(1+vars[:beta][]))-1)) / (vars[:w][t])  # Function of different prices
        [t = 2:(T2+1)], vars[:s][t] == vars[:r][t] * vars[:X][t] / vars[:Y][t]  # Share of land rents on GDI
        [t = 2:(T2+1)], vars[:Y][t] == vars[:Y_r][t] + vars[:q][t]*vars[:Y_u][t]  # GDP
        [t = 2:(T2+1)], vars[:Y_r][t] == vars[:A][t] * vars[:H][t]^vars[:eps][] * vars[:Lr][t]^vars[:delta][] * vars[:X][t]^(1 - vars[:delta][]) # GDP rural
        [t = 2:(T2+1)], vars[:Y_u][t] == vars[:H][t]^vars[:alpha][t] * vars[:Lu][t]^(1 - vars[:alpha][t]) # GDP urban
        [t = 2:(T2+1)], vars[:Y_pc][t] == vars[:Y][t]/vars[:T][t] # GDP pc
    end)


    # Lock A, B, beta after T1+1 to their T1+1 values. d and P are already fixed after T1+1
    @constraints(model, begin
        [t = T1+3:T2+2], vars[:A][t] == vars[:A][T1+2]
        [t = T1+3:T2+2], vars[:B][t] == vars[:B][T1+2]
        [t = T1+3:T2+2], vars[:alpha][t] == vars[:alpha][T1+2]
    end)


    # ======= Steady-state constraints  ========
    # for t = 1 and t = T2+2

    # init_val
    # ---- linear block ----
    @constraints(model, begin
        [t in [1, T2 + 2]], vars[:g][t] == ppi * vars[:P][t]  # Generation
        [t in [1, T2 + 2]], vars[:T][t] == vars[:g][t] + vars[:g][t]  # Total population
        [t in [1, T2 + 2]], vars[:wpop][t] == vars[:T][t] - vars[:M][t] - vars[:M][t]  # Working population
        [t in [1, T2 + 2]], vars[:a_bar][t] == vars[:h_bar][t]  # Intergenerational human capital transmission
    end)

    # ---- nonlinear block ----
    @NLconstraints(model, begin
        [t in [1, T2 + 2]], vars[:urb][t] == vars[:Lu][t] / vars[:d][t] + vars[:M][t] + vars[:M][t]  # Urban population
        [t in [1, T2 + 2]], vars[:rur][t] == vars[:Lr][t] / vars[:d][t]  # Rural population
        [t in [1, T2 + 2]], vars[:r_urb][t] == vars[:urb][t] / vars[:T][t]  # Urbanization rate
        [t in [1, T2 + 2]], vars[:a_hat][t] == (vars[:p][t] / (vars[:B][t] * vars[:h_bar][t]^(1 - vars[:theta][] - vars[:gamma][] / 2)))^(2 / (2 * vars[:theta][] + vars[:gamma][]))  # Indifferent individual threshold
        [t in [1, T2 + 2]], vars[:a_tilda][t] == (vars[:a_hat][t] * vars[:a_bar][t])^(1 / 2)  # Peer effect
        [t in [1, T2 + 2]], vars[:h_bar][t] == vars[:B][t] * (vars[:a_bar][t])^vars[:theta][] * (vars[:a_tilda][t])^vars[:gamma][] * (vars[:h_bar][t])^(1 - vars[:gamma][] - vars[:theta][])  # Knowledge frontier
        [t in [1, T2 + 2]], vars[:h_min][t] == vars[:B][t] * (vars[:a_hat][t])^vars[:theta][] * (vars[:a_tilda][t])^vars[:gamma][] * (vars[:h_bar][t])^(1 - vars[:gamma][] - vars[:theta][])  # Minimum human capital at university
        [t in [1, T2 + 2]], vars[:g_hbar][t] == vars[:h_bar][t] / vars[:h_bar][t]  # Growth knowledge frontier
        [t in [1, T2 + 2]], vars[:g_hmin][t] == vars[:h_min][t]/vars[:h_min][t]   # Growth minimum human capital
        [t in [1, T2 + 2]], vars[:m][t] == log(vars[:a_bar][t] / vars[:a_hat][t]) / log(vars[:a_bar][t] / a_lbar)  # Share of students over generational population
        
        [t in [1, T2 + 2]], vars[:M][t] == vars[:g][t] * vars[:m][t]  # Number of students
        # in t = 1, m is realized neglecting beta (t=2). Therefore beta (t=2) is completely unidentified
        
        [t in [1, T2 + 2]], vars[:comp][t] == vars[:a_hat][t] / vars[:h_bar][t]  # University composition
        [t in [1, T2 + 2]], vars[:h_mean][t] == vars[:B][t] * (vars[:a_tilda][t])^vars[:gamma][] * (vars[:h_bar][t])^(1 - vars[:gamma][] - vars[:theta][]) * 1 / vars[:m][t] * 1 / log(vars[:a_bar][t] / a_lbar) * 1 / vars[:theta][] * (vars[:a_bar][t]^vars[:theta][] - vars[:a_hat][t]^vars[:theta][])  # Average human capital supplied
        [t in [1, T2 + 2]], vars[:H][t] == vars[:M][t] * vars[:h_mean][t]  # Aggregate human capital # WHY t-1 in h_mean??
        [t in [1, T2 + 2]], vars[:Lr][t] + vars[:Lu][t] == ((1 - vars[:m][t]) * vars[:g][t] + (1 - vars[:m][t]) * vars[:g][t]) * vars[:d][t]  # Labour supply
        [t in [1, T2 + 2]], vars[:q][t] * vars[:H][t]^vars[:alpha][t] * (1 - vars[:alpha][t]) * vars[:Lu][t]^(-vars[:alpha][t]) == vars[:A][t] * vars[:H][t]^vars[:eps][] * vars[:delta][] * vars[:Lr][t]^(vars[:delta][] - 1) * vars[:X][t]^(1 - vars[:delta][])  # Labour repartition
        
        [t in [1, T2 + 2]], vars[:q][t] == (vars[:beta][] * vars[:Y_r][t]) / (vars[:Y_u][t] + vars[:g][t]*vars[:phi][])   # Relative price of urban goods
        
        [t in [1, T2 + 2]], vars[:w][t] == vars[:q][t] * vars[:alpha][t] * vars[:H][t]^(vars[:alpha][t] - 1) * vars[:Lu][t]^(1 - vars[:alpha][t])  # Human capital wage
        [t in [1, T2 + 2]], vars[:f][t] == vars[:A][t] * vars[:H][t]^vars[:eps][] * vars[:delta][] * vars[:Lr][t]^(vars[:delta][] - 1) * vars[:X][t]^(1 - vars[:delta][])  # Labor wage
        [t in [1, T2 + 2]], vars[:f_r][t] == vars[:f][t] / vars[:f][p_n_year]  # Labor wage (base year = n_year)
        [t in [1, T2 + 2]], vars[:r_r][t] == vars[:r][t] / vars[:r][p_n_year]  # Land rents (base year = n_year)
        [t in [1, T2 + 2]], vars[:gf][t] == vars[:f][t] / vars[:f][t]  # Growth labor wage
        [t in [1, T2 + 2]], vars[:r][t] == vars[:A][t] * vars[:H][t]^vars[:eps][] * vars[:Lr][t]^vars[:delta][] * (1 - vars[:delta][]) * vars[:X][t]^(-vars[:delta][])  # Land rents
        [t in [1, T2 + 2]], vars[:p][t] == ((vars[:d][t] * vars[:f][t] + vars[:d][t] * vars[:f][t])*vars[:v][t]^(-1/(1+vars[:beta][])) 
                            + (vars[:r][t]*vars[:X][t]/vars[:g][t]+vars[:phi][]*vars[:q][t])*(vars[:v][t]^(-1/(1+vars[:beta][]))-1)) / (vars[:w][t]) # Function of different prices
        [t in [1, T2 + 2]], vars[:s][t] == vars[:r][t] * vars[:X][t] / vars[:Y][t]  # Share of land rents on GDI
        [t in [1, T2 + 2]], vars[:Y][t] == vars[:Y_r][t] + vars[:q][t]*vars[:Y_u][t]  # GDP
        [t in [1, T2 + 2]], vars[:Y_r][t] == vars[:A][t] * vars[:H][t]^vars[:eps][] * vars[:Lr][t]^vars[:delta][] * vars[:X][t]^(1 - vars[:delta][]) # GDP rural
        [t in [1, T2 + 2]], vars[:Y_u][t] == vars[:H][t]^vars[:alpha][t] * vars[:Lu][t]^(1 - vars[:alpha][t]) # GDP urban
        [t in [1, T2 + 2]], vars[:Y_pc][t] == vars[:Y][t]/vars[:T][t] # GDP pc
    end)

Hi @fmanfredini1, welcome to the forum :smile:

managed to solve the program encountering no particular issues

Do you have a log of the Ipopt solve? Are you happy with the solution? If so, great!

Scaling is a very nuanced topic that you could spend too much time on…

Here’s a tutorial to read: Tolerances and numerical issues · JuMP. It has some links to other sources that are worth watching/reading.

The key is that when scaling you must scale both a variable and a constraint. If you just scale a row or just scale a constraint it doesn’t really help.

p.s., note that you can replace @NLconstraint with @constraint. See Nonlinear Modeling · JuMP