Optimization with Optim gives different results each time

Michael_Barmann · January 28, 2021, 8:38pm

I’m using IPNewton() from the Optim package for an inequality-constrained optimization problem. Strangely, I keep getting different results each time I run the same block of code (without changing anything). For example, the first time I ran the optimization, the result was this:

 * Status: success

 * Candidate solution
    Final objective value:     1.000671e-02

 * Found with
    Algorithm:     Interior Point Newton

 * Convergence measures
    |x - x'|               = 0.00e+00 ≤ 0.0e+00
    |x - x'|/|x'|          = 0.00e+00 ≤ 0.0e+00
    |f(x) - f(x')|         = 0.00e+00 ≤ 0.0e+00
    |f(x) - f(x')|/|f(x')| = 0.00e+00 ≤ 0.0e+00
    |g(x)|                 = 6.73e-01 ≰ 1.0e-08

 * Work counters
    Seconds run:   63  (vs limit Inf)
    Iterations:    132
    f(x) calls:    243
    ∇f(x) calls:   243

I then copied and pasted my code into a new Jupyter notebook cell and ran it, and got this wildly different answer:

 * Status: success

 * Candidate solution
    Final objective value:     1.706405e+00

 * Found with
    Algorithm:     Interior Point Newton

 * Convergence measures
    |x - x'|               = 0.00e+00 ≤ 0.0e+00
    |x - x'|/|x'|          = 0.00e+00 ≤ 0.0e+00
    |f(x) - f(x')|         = 0.00e+00 ≤ 0.0e+00
    |f(x) - f(x')|/|f(x')| = 0.00e+00 ≤ 0.0e+00
    |g(x)|                 = 1.48e+02 ≰ 1.0e-08

 * Work counters
    Seconds run:   147  (vs limit Inf)
    Iterations:    260
    f(x) calls:    457
    ∇f(x) calls:   457

The first time gave me an objective value that was 100x better! I’m very confused, as I expected the results to be identical. (And to be clear, I did not change the objective function in between runs.) As the trace shows, the initial function value in the two runs is the same, but the initial Lagrangian, gradient norm, and mu are all different:

Run 1:

Iter     Lagrangian value Function value   Gradient norm    |==constr.|      μ
     0   4.269013e+05     4.268759e+05     2.175804e+06     0.000000e+00     5.52e-01

Run 2:

Iter     Lagrangian value Function value   Gradient norm    |==constr.|      μ
     0   4.440598e+05     4.268759e+05     2.177987e+06     0.000000e+00     3.72e+02

I’ve done a few more runs and I continue to get different results each time.

So my questions are:

Why is this happening? Does the algorithm involve some randomization or stochasticity?
Is there a way to make it give the same results each time? I would really like for my results to be reproducible.

Update 1/29: Sorry for the lack of code. I’m working on making a MWE, though this is proving harder than expected. I tried copying and pasting the constrained optimization w/IPNewton example from the documentation, but it didn’t have the reproducibility issue (I got the same result each time I hit Ctrl+Enter). So now I’m working on making a simplified version of my code that still exhibits the reproducibility issue.

BLI · January 28, 2021, 9:18pm

I haven’t tested IPNewton() yet, but I’m curious about it since it seems to be a valuable addition to Optim. I’m wondering what the size of your problem is – how many unknowns x, how many inequality conditions, etc.?

Of course, I’m also curious about the quality of it compared to, say, Ipopt.

Michael_Barmann · January 28, 2021, 10:00pm

Hi, I’d be happy to talk about optimizers in general elsewhere, but this is not really on topic for this thread.

thisrod · January 29, 2021, 6:55am

That’s what you’re doing wrong: to get reproducible results, you clearly need to run a different block of code.

To give you more precise advice than that, we’ll need to see the code you’re running, and some other things described here.

dpsanders · January 29, 2021, 7:26am

As @thisrod correctly pointed out, it’s impossible to say without more information. Are you fixing the initial starting point?

BLI · January 29, 2021, 12:48pm

To repeat my question: I’m wondering what the size of your problem is – how many unknowns x , how many inequality conditions, etc.?

It is difficult to do any testing without having any idea of what you do.

tbeason · January 29, 2021, 4:08pm

I’ll start with the basic question: did you hide a call to rand() in your objective function?

Michael_Barmann · January 29, 2021, 5:28pm

Nope, there is no call to rand() anywhere. The objective function is deterministic.

Michael_Barmann · January 29, 2021, 5:46pm

Yes, x0 is fixed. And sorry, will try to post a MWE soon.

Michael_Barmann · January 29, 2021, 5:49pm

I’m optimizing over 19 variables, each with box constraints, along with two inequality constraints: 0 <= x[1]/3 - x[2] <= 1/3 and 5 <= 1/x[3] + 1/x[4] <= 6.

Michael_Barmann · January 29, 2021, 5:55pm

I’ll try to post a MWE soon…I don’t understand your point though. Why would I need to run a different block of code to reproduce my results? For example, if I have one Jupyter notebook cell whose only code is foo(5), where foo() is some function defined in a previous cell, why wouldn’t executing the cell with foo(5) give the same result each time? This is the first time I’ve encountered a situation (in the 8 weeks I’ve been using Julia) where this non-reproducible behavior has occurred.

sylvaticus · January 29, 2021, 8:31pm

Could it be that somehow some status related to the optimisation is preserved in between?

If you copy your notebook and at the beginning of the second notebook you make somerand() (so that they are not exactly the same), the first time you run the solver in the second notebook, you have the same result as in the first optimisation in the first notebook ?

Michael_Barmann · January 29, 2021, 11:12pm

Thanks for your reply! I’m not sure what you meant by “make some rand()” though–did you mean use Random.seed!()?

Aside from that, I tried your suggestion: I copied everything into a new notebook, restarted the kernel in the first notebook, ran both, and they gave the same answer of 6.239305e-03. The optimization is being called within a function I wrote called ModelFit(). I noticed that when I make a trivial change to the parameters of ModelFit() (i.e. to parameters having nothing to do with the optimization calculation), and then I run ModelFit() for a second time, I get 6.239305e-03 again. But if I simply run ModelFit a second time without changing anything, I get a slightly different answer.

So I think you are on to something…any idea what this means?

Michael_Barmann · January 29, 2021, 11:33pm

I just want to highlight @sylvaticus’s suggestion, as it has provided some interesting information that seems useful:

Following this sugestion, I copied everything into a new notebook, restarted the kernel in the first notebook, ran both, and they gave the same answer of 6.239305e-03.

Some additional info: The optimization is being called within a function I wrote called ModelFit(), which I’m calling with the following code:

@time out = ModelFit(ODE_model = One_Age_Model,
             ODE_vars = ["S","E","P","A","AR","I","H","R","D","Hc"],
             vars_to_fit = ["D","Hc"],
             paramIC_dict = ParIC_dict,
             data_dict = Data_dict,
             f_calc_ICs = f_ICs,
             fit_range = ["03-22-2020","06-30-2020"],
             date_obs_last = "08-01-2020",
             forecast_until = "08-15-2020",
             norm = mean_square_rel_error,
             norm_weights = fill(1,110),
             norm_scale = 1,
             N0 = 127575528,
             integrator_options = int_options,
             optimizer_options = optim_options)

After refreshing the notebook kernel, the first time I execute the above code produces an objective function value of 6.239305e-03. Executing the above code a second time gives 6.286389e-03. Then executing it a third time gives 6.225802e-03. Subsequent evaluations keep giving different outputs. But whenever I refresh the kernel, this pattern always repeats, with 6.239305e-03 always the first answer it gives, followed by 6.286389e-03, and then 6.225802e-03, and on and on…

Another observation: If make a trivial change to the parameters, such as changing the S in ODE_vars to SS (which has no effect on the optimization whatsoever), I get back the original value of 6.239305e-03 again. Running this code again (with the SS modification) gives 6.286389e-03 again. And then I again get 6.225802e-03 on the third try. I’m really baffled by this…

So in summary, the result is reproducible only if I 1) refresh the kernel, or 2) copy the code into a new notebook. Apologies for not coming up with a MWE. I tried but struggled to produce a simple example that yielded the same phenomenon. I can post the original code on GitHub if anyone wants to take a look and try it out for themselves.

thisrod · February 1, 2021, 12:46am

That’s good news. It sounds like the issue is with notebooks, variables and evaluation; not with Optim doing something non-deterministic.

Not using a notebook might be an effective workaround. What happens when you paste all these cells into a file test.jl, and run julia test.jl from the command line?

My favorite method is bisection. Replace half the code with a stub, and see if the problem still occurs. If it does, replace half the remaining code with a stub and repeat. If it doesn’t, restore half the code you replaced and repeat.

For example, you might replace the ODE solving with solution = 1:100.

Good luck.

mcreel · February 1, 2021, 7:40am

Perhaps this has to do with the tolerance of the ODE solver and is not related to Optim? Plugging in a fixed solution, as in the previous post, would help to confirm that.

Michael_Barmann · February 14, 2021, 7:35pm

Thanks to everyone for the replies and suggestions. Interestingly, I made a new version of my function and it works as expected–executing it with the same inputs always yield the same output, every time. (Unfortunately, I still don’t know what I did to fix the problem, or what the problem was.)

I’ve put this issue aside for now, having spent a few hours getting nowhere with it. If it comes up again and I gain any new insight, I’ll be sure to provide an update.

Topic		Replies	Views
How is Optim.jl's IPNewton so good compared to (exponentiated) gradient descent? Optimization (Mathematical)	4	1061	August 17, 2022
IPNewton (suddenly) fails on trivial example New to Julia	2	60	March 25, 2025
Newton optimization with Optim doing strange things Optimization (Mathematical)	36	2774	April 12, 2018
How to use specific convergence measures in Optim.jl? Optimization (Mathematical) question , optim	3	2248	October 25, 2021
Parameters Optimization with Optim General Usage optim	4	173	February 12, 2025

Optimization with Optim gives different results each time

Related topics