Optim and differential equations?

Hard to say without the code at hand. Does it do the shrink step a lot? Or maybe it just has a hard time settling down.

I just sent you the code – you are much more of an expert than I am. In case you are interested in looking into it.

Here is a close-up of where the L-BFGS stops in the univariate case:

In this case, the L-BFGS/Fminbox solver stops at ca. 0.7577.

Note: I removed variables #s 1, 2, 4, 5 when I did the multivariable optimization. [In other words, plot matrix elements (1,1), (1,2), (1,4), and (2,1).]

Also, note that element # 6 [plot matrix element (2,2)] has a distinct minimum if I plot from parameter value 0.15 onwards, instead of from value 0.3.

Nelder-Mead doesn’t scale well with dimension, it computes D+1 times the error function for each iteration, while other algorithms scale better.

1 Like

You can also try SAMIN from Optim btw. I’m finding this optimum

     Obj. value:      0.0290794633

       parameter      search width
         1.82413           0.00000 
         1.61697           0.00000 
         0.58598           0.00000 
         1.12229           0.00000 
         1.57725           0.00000 
         0.55275           0.00000 
         1.49769           0.00000 
         1.55732           0.00000 
         1.64324           0.00000 
         0.79532           0.00000 

though it may of course vary from run to run

OK… I’ve tested SAMIN, too. To save some time, I commented out Nelder-Mead. Here is a comparison of parameter values:


The local optimizers (L-BFGS and GD=GradientDescent in Optim) used ca. 14 seconds to find their suggested optimum, the global solvers used (SA=SAMIN in Optim) ca. 8 seconds, and (BB=BlacBoxOptim default solver) ca. 70 seconds.

The lowest loss function was found by BlackBoxOptim (BB), followed by SAMIN (SA).

default number of “iterations” in SAMIN? I think you can match BB, even if you only go up to say 20-25 seconds. It will all be problem dependent of course. @mcreel might have something clever to say about hyper parameter selection for this problem.

Yes, I used the default number of iterations. I next tried to use SAMIN in a loop where I draw the initial “guess” from rand(). The idea was to plot how the solution possibly varies depending on initial guess. Problem: it didn’t converge. I then tried to increase the max number of iterations. I found an example on the documentation, and raised the max number of iterations to 10^6.

Probably too much… the solver took forever. Any idea of what the default max number of iterations is?

It should be 1000 (10^3). Try 5000 or even 8000 instead.

Hm… I raised the max number of iterations to 10^4, and solved the optimization 10 times with SAMIN using random initial values. The cost function varied in the interval [2.95,3.58]\times 10^{-3}, with computation time ca. 40 s pr. optimization and with parameter spread as below:

With BlackBoxOptim, default set-up, the cost function varied in the interval [2.91,2.95]\times 10^{-3}, with computation time ca. 24 s pr. optimization, and [updated!] also with visible parameter spread:

I’m sure it is possible to tweak the hyper parameters of SAMIN to perform better/faster.

For SAMIN, if you observe a spread, as in the graph above, it is because the cooling rate, rt, is too slow, given the maximum number of iterations.

The conservative policy is to leave the cooling rate fairly close to 1, and increase the iteration limit. That, of course, will increase the run time a lot, as noted above.

If the problem is not too irregular, you can safely lower the cooling rate. This will reduce the runtime, too. Try running SAMIN with rt=0.5, or rt=0.25, and see what you get. If the parameter spread disappears, that’s an indication you can get away with rapid cooling for this problem. You might set the iteration limit high, e.g., 10^6, and rt fairly low, e.g., rt=0.25. Give it a try, and see if you get convergence quickly enough so that the high iteration limit never is reached.

1 Like

Thanks, I’ll check. I suspect part of my problem is the non-smoothness of the cost function. As an example, here is the original cost function variation with scalar parameter:


[see also above]

The cost function above is based on the Tsit5() ODE solver. If I instead use Euler() with fixed step length, the cost functions appear to be smooth:

This time, the gradient based/local solver are much slower (perhaps they don’t give up, or perhaps the Euler solver makes it slow?). I’ll update when I have recomputed the parameter values for the various solvers to see whether there is less spread with the Euler solver/smoother cost function.

UPDATE:
The L-BFGS method now gives best result, but is rather slow (perhaps because it actually finds the optimum this time??). The minimum varies in the interval [2.958,2.969]\times 10^{-3} (one “outlier”), with parameter variation:

SAMIN with rt=0.5 and max no. iterations 10^4 gives minimum in [2.961,3.169]\times 10^{-3}:

The default BlackBoxOptim algorithm gives minimum in [2.960,2.975]\times 10^{-3}:

These smooth profiles look a lot easier to deal with. However, it appears that mr and ms may be zero, if they are constrained to be non-negative. Also, Rr and Rs seem to be poorly identified: the objective function is approximately flat, it seems. Perhaps it’s a data scaling problem?

4 Likes

mr and ms are masses of copper in an electric generator, and cannot be zero. They can be measured quite accurately, but the model assumes homogeneous temperature in the copper which of course is not 100% correct.

Rr and Rs… their constancy is due to a programming error (someone else programmed it, but I should have checked the model…).

Right, but Michael’s point is that judging from your data and model it seems that the best fit is indeed that those parameters are 0 at the optimum (at least from these partial plots). If that is not true, something in the whole setup is strange.

Just as an FYI, the parameters in the figure that shows the estimates across runs is not 1,2,3… as in the plots. In the figure the parameters [1,2, … 10] correspond to plots [3,6,7,8,9,10,11,12,13,14] respectively (so @BLI holds the “weird” constants fixed at some values).

2 Likes

True. On the other hand, the plots indicate optima if single parameters are varied. If multiple variables are varied simultaneously, I doubt that mr and ms go to zero – that would go against the physics of the problem: essentially, some temperatures would then be able to change instantaneously. And they don’t do that in practice.

I guess this example illustrates that:

  1. Choice of the best model fitting algorithm (“optimizer”) is non-trivial, and to some degree depends on the shape of the loss function.
  2. The choice of “shooting algorithm” (ODE solver) is non-trivial – should one use a fast algorithm with variable step length, which leads to “noisy” loss function, or a less efficient (less accurate?) fixed step-length algorithm that probably (!?) gives a smoother loss function?
  3. With “too many” parameters, the cost function will probably have a relatively flat “valley” around the optimum, and the minimizer (parameter) values can vary wildly without a large variation in the minimum value of the cost function/non-unique solution. The problem of identifiability is interesting in this respect: how to detect poor identifiability, and how to reduce the parameter set to maximize identifiability.

It is also known that model fitting with cost function found from shooting (output error, etc.) can have loads of local minima – this particular example doesn’t really show that.

Ah, well: thanks for all fruitful inputs to all participants! I’ve learnt a lot! I’ll leave the further details to my summer job student, and move on to my other related problem: fitting a model of a heated water tank to experimental data. [Another student fitted 3 parameters to those data + 20 initial temperature states (discretized PDE) – it took 5 hours in Python. I hope it will run much faster in Julia.]

I also look forward to see improvements in all the tested algorithms, and learn how to utilize them better. @pkofod: you have indicated that the “master” version of Optim will work even better – I look forward to test that, too.

Some analysis is needed. We hope to add that form of analysis to ModelingToolkit.

2 Likes

Interesting! Some theoretic identifiability analysis can probably also be done purely based on the model by using AD, etc. See, e.g., Global Identifiability of Nonlinear Model Parameters - ScienceDirect . But these ideas may be too complex for “real world” models.

Too complicated in 1997 might be feasible in 2019?

1 Like