Providing More Initial Parameter Values than there are Parameters in the Function Being Optimized

Why does the “extra” parameter value affect the optimization? Apologies if this is in the Optim documentation - I didn’t see it.

For example:

using Optim
f(x) = (1.0 - x[1])^2 + 100.0 * (x[2] - x[1]^2)^2

x0 = [0.0, 0.0, 2.0]
optimize(f, x0)

gives a different result than

f(x) = (1.0 - x[1])^2 + 100.0 * (x[2] - x[1]^2)^2

x0 = [0.0, 0.0]
optimize(f, x0)

What is the “2.0” in the initial parameter value array doing?

I don’t know what it is doing, but probably the internals of Optim.jl are general enough so that the code doesn’t care about the exact length of the vector on updates. I would simply avoid passing the incorrect vector to the optimize function if it is a 2D objective.

Could it be changing f(x) to this:

f(x) = (1.0 - x[1])^2 + 100.0 * (x[2] - x[1]^2)^2 + 0*x[3]

Would be nice to know - I made this mistake with a more complicated problem and the results I got were actually improved…

We can play the guessing game, but if you are really interested in this corner case, Optim.jl is open source! Type @edit optimize(f, x0) and read the code :wink:

The issue is probably not in the objective function, but in the vector updates. You can write x -= M*x for example without knowing the exact length of x.

1 Like

I can’t replicate this. I get the same minimizer (within numerical error) for the first two coordinates. Of course the third coordinate is random, but that should not matter.

If you are asking whey the results are not exactly equal: they are two different problems (in \mathbb{R}^2 and \mathbb{R}^3) and different steps are taken.

Why would you think it changes f? The function you originally defined is applicable to vectors of length 3 (or any length ≥2) as is:

f(x) = (1.0 - x[1])^2 + 100.0 * (x[2] - x[1]^2)^2
f([1,2,3]) ## == 100.0

Well, I don’t see why it would affect the result. Are you sure you don’t have a sum of x or norm or something like that that would be affected by the third element?

I don’t think it does (in the \approx sense). Did you run the example?

Ah, so actually it does in this case because you’re using Nelder-Mead and the initial simplex is affected by the initial x. The centroid element for x[1] and x[2] changes which changes the progression of the algorithm. Edit: but the solution is obviously approximately the same (one has element just above one and the other just below, and the minimizer is [1,1])

1 Like

Thanks for all the responses. This makes sense, much more plausible than my hypothesis.

Thanks again,

DS