I was able to verify the indicated minima for n=6 and n=9 with Fortran and R software.
The case n=12 turned out to be more difficult. I got the correct result only with a
specialized function lsqnonlin() from the R package pracma.
In Julia I tried the LsqFit package. It requires a ‘model’, not just the name of a function.
I know from other experiments that it is not as accurate and powerful as, e.g., the ‘nlsr’
tool in R. ‘Optim’ with different methods did not really come close - and may not be
appropriate for least-squares minimization problems.
It is necessary to determine the minimum very accurately because the point where the
minimum is reached depends heavily on the minimum value found.
Can you suggest a solver in Julia that will return the same (or a better) minimum of the
Watson function, say in [0,1]^n for n=12 ? And what solver in Julia is recommended for
nonlinear least-squares problem (with Levenberg-Marquardt extension) in general,
when ‘LsqFit’ is not powerful enough?
PS. Implementation of Watson in Julia
function watson(x)
m = 29; n = length(x)
t = zeros(Float64, m)
for i = 1:m
t[i] = float(i)/m
end
F = zeros(Float64, m+2)
for i = 1:m
f11 = 0.0
for j = 2:n
f11 = f11 + (j-1) * x[j] * t[i]^(j-2)
end
f12 = 0.0
for j = 1:n
f12 = f12 + (x[j] * t[i]^(j-1))
end
F[i] = (f11 - f12^2 - 1.0)
end
F[m+1] = x[1]
F[m+2] = x[2] - x[1]^2 - 1.0
return sum(F.^2)
end
which is what is required by a Levenberg–Marquardt type algorithm.
If you just supply the sum-of-squares objective function, that’s not enough to use the Levenberg–Marquardt algorithm. You have to call some generic nonlinear-optimization algorithm if you only have the sum of the squares.
I don’t know how you called lsqnonlin from R’s pracma with this function, since my understanding is that it emulates the lsqnonlin function from Matlab, which also requires you to specify f and not the sum of the squares.
Are you sure you’re doing an apples-to-apples comparison?
PS. A more Julian implementation of your watson function, which doesn’t allocate arrays, and computes using the numeric type of the inputx rather than always using double precision (this is important if you want to use an AD tool like ForwardDiff.jl to get the gradient) is:
function watson(x)
m = 29; n = length(x)
F = float(zero(eltype(x)))
for i = 1:m
f11 = zero(F)
t = oftype(F, i) / m
for j = 2:n
f11 = f11 + (j-1) * x[j] * t^(j-2)
end
f12 = zero(F)
for j = 1:n
f12 = f12 + (x[j] * t^(j-1))
end
F += (f11 - f12^2 - 1)^2
end
F += x[1]^2
F += (x[2] - x[1]^2 - 1)^2
return F
end
(This is still quite suboptimal, since you could e.g. compute the powers t^j iteratively in the loop rather than recomputing them from scratch on each iteration.)
You can pass the same sort of function to LsqFit in this example with e.g. a function that return a vector of the f_i(x) functions:
function watsonmodel(indices::UnitRange{<:Integer}, x)
indices == 1:31 || throw(ArgumentError("invalid indices $indices"))
n = length(x)
F = Vector{float(eltype(x))}(undef, 31)
for i = 1:29
f11 = zero(eltype(F))
t = oftype(f11, i) / 29
for j = 2:n
f11 = f11 + (j-1) * x[j] * t^(j-2)
end
f12 = zero(eltype(F))
for j = 1:n
f12 = f12 + (x[j] * t^(j-1))
end
F[i] = f11 - f12^2 - 1
end
F[30] = x[1]
F[31] = x[2] - x[1]^2 - 1
return F
end
In which case LsqFit.jl exactly reproduces the results from the paper:
For example, using your watson sum-of-squares function, you can use BFGS from Optim.jl. If you use my version of watson, it works with automatic differentiation to get the gradient, so you can turn on autodiff:
In general, you should expect that the appropriate choice of tolerances in any optimization or fitting problem to be specific to the problem at hand. Always blindly using the default tolerances is a common mistake (in any language, with any library).
Thank you for your extensive and very helpful answer. You are right, MATLAB and ‘pracma’ expect the objective function to return a vector, not a scalar. That is the reason I implemented the Watson function using a pre-allocated array. Thus I can easily change between return F and return sum(F^2).
Sorry, I was not reading the documentation of ‘LsqFit’ carefully enough. I am much impressed by how you applied Optim to this problem. Sometimes I wish, the documentation of Julia packages would go into more detail. Thanks again.