with t \in [0,2\pi] and z_1,z_2,d \in \mathbb{R}^n. How would you minimize this objective efficiently? I am trying to derive an analytical solution, but am still not sure if it is possible.
If anyone has suggestions, please feel free to share. It is easy to use Optim.jl or any black-box optimization package to solve the problem, but I wonder if we can do better.
The function is not convex in general; it can have up to three extrema, two of which are local minima and one a local maximum, I think. You might derive an analytical solution by finding the roots of the first derivative, but the solution is likely going to be ugly. So you would probably go for a numerical solution anyway. I think that, e.g., Newtonâs method would converge fast enough (with initialization like u_0 = \pm 1).
Maybe this specific (more informed) approach is better than a general method from Optim.jl in some aspects, but I have no idea really.
Black box might have the problem, that it is defined on the Circle and hence intrinsically non-convex, you could also use a gradient descent or newton method on the circle though.
With Box constraints you might be lucky enough to choose the right constraints ([0,2\pi] or [-\pi,\pi] or any other shift is possible, they all might yield different local minima), compared to that you do not have to choose that on the circle â and then using Particle Swarm (maybe even NelderMead) or LTMADS might be nice to find a global minimum still.
Itâs periodic in t, so any such interval is equivalent.
I think youâre over-thinking this. Yes, itâs non-convex. Yes, it might have multiple local minima. But itâs only a function of a single variable, and itâs not that oscillatory, so you can easily brute-force it.
NelderâMead is only local minimization (and is an outdated algorithm that is not even guaranteed to converge to a global minimum). Particle swarm, like genetic algorithms, is one of those algorithms that I personally think gets disproportionate attention because it has a nice story/picture, as opposed to algorithms that just has math. LTMADS is more for handling constraints.
For a simple box-constrained problem in one variable I would probably just use DIRECT or maybe MLSL.
To minimize it, I did two passes using NLopt.jl: first, a global search using the DIRECT-L algorithm, which is essentially a slightly clever brute-force search, and told it to find the minimum to within 5% of 2Ï. This takes only 21 iterations ⊠honestly, you might as well use a grid search for this. Then I used this as a starting point for a gradient-based local search, which converged to 6 digits in 5 iterations:
function J_objective(x, grad, zâ, zâ, d)
t = x[1]
if !isempty(grad)
grad[1] = ForwardDiff.derivative(t -> J(t, zâ, zâ, d), t)
end
return J(t, zâ, zâ, d)
end
global_opt = NLopt.Opt(:GN_DIRECT_L, 1)
NLopt.lower_bounds!(global_opt, [0.0])
NLopt.upper_bounds!(global_opt, [2Ï])
NLopt.xtol_abs!(global_opt, 2Ï * 0.05)
NLopt.min_objective!(global_opt, (x, grad) -> J_objective(x, grad, zâ, zâ, d))
@show min_f1, min_x1, ret1 = NLopt.optimize(global_opt, [1Ï])
@show NLopt.numevals(global_opt)
local_opt = NLopt.Opt(:LD_LBFGS, 1)
NLopt.lower_bounds!(local_opt, [0.0])
NLopt.upper_bounds!(local_opt, [2Ï])
NLopt.xtol_abs!(local_opt, 1e-6)
NLopt.min_objective!(local_opt, (x, grad) -> J_objective(x, grad, zâ, zâ, d))
@show min_f, min_x, ret = NLopt.optimize(local_opt, min_x1)
@show NLopt.numevals(local_opt)
The results corresponds to t/2Ï = min_x[1] / 2Ï of 0.14496333616083928, which is the first minimum on the plot above. The other minimum is actually equally good, so it is somewhat random which one it finds.
Alternatively, since this is a smooth function of 1 variable, you can fit it to a polynomial to machine precision using Chebyshev approximation, which only takes about 30 function evaluations. Then you can find all the extrema of the polynomial by taking the derivative and finding the roots via a companion-matrix method. ApproxFun.jl does all of this for you, and in fact can do it in a Fourier-series basis that takes the periodicity into account:
using ApproxFun
Jf = Fun(t -> J(t, zâ, zâ, d), Fourier())
r = roots(Jf')
@show r[findmin(Jf.(r))[2]]
which gives
r[(findmin(Jf.(r)))[2]] = 4.238034418629369
which corresponds to t/2Ï, the second local minimum in the plot above. The nice thing about this is that it gives you the root to machine precision, and handles all of the details for you.
PS. Iâm actually not sure what method ApproxFun uses to find roots of a Fourier series. @dlfivefifty?
PPS. You could even just do a grid search and then polish the root with a couple of Newton steps on the gradient, no fancy libraries required.
PPPS. If you really need to do this blindingly fast, so that the above arenât fast enough for you, this is a simple enough function that I suspect you could do some more analysis to narrow things down, e.g. to prove things about the kinds of local minima it might have â experimentally it always seems to have one local minimum or two equivalent local minima, in which case a single local search would suffice. But my basic point is that this particular problem is trivial to brute-force, so it might not be worth a lot of head-scratching unless you really care about it.
You are right. Probably because I usually do not work in single variables.
Concerning NelderâMead and LTMADS â also there, I have to confess I am far behind your knowledge on those, for example I have no clue about DIRECT or MLSL.
Sorry, will not do that again.
that is a quartic polynomial in y (there are formulas to obtain the analytic solution for this problem, or you can use a polynomial solver). you can find then find all the (real) solutions to the polynomial, the values of y that are not in [-1,1] and find which one satisfies the minimum condition.
if there is something wrong, let me know, i was solving this as i wrote it, but if there is a solution, it involves obtaining a polynomial system in $x, $y with coefficients that are reductions of the input vectors.
I think thereâs a bug here. Refining the tolerance (keyword argument tol) results in even more minimizer intervals, most of which overlap. @dpsanders perhaps youâre interested.
I will add here a reformulation for your problem, although I guess that you, @juliohm, are not only aware of it but also that this was actually the original problem that you reformulated into the single-variable problem. Here it goes:
where A\in\mathbb R^{n\times 2}, and b \in \mathbb R^n.
It can be viewed as an instance of a quadratically constrained quadratic program (QCQP), obviously an unconvex one.
One way â and perhaps the best one â is by direct parameterisation of the feasible set (the unit circle) using a single real variable (the angle).
Another way could be the following. Upon introducing a Lagrange multiplier \lambda\in\mathbb R, we can write the necessary conditions of optimality as
\begin{aligned}
(A^\top A + \lambda I) x &= A^\top b \\
x^\top x - 1 &= 0.
\end{aligned}
The matrix defining the first equation is 2\times 2, hence a formula can be written for its solution as a function of \lambda
x(\lambda) = (A^\top A + \lambda I)^{-1}A^\top b.
We can then substitute into the left hand side of the second equation
e(\lambda) := x(\lambda)^\top x(\lambda) - 1.
What we are after are the roots of this function. The function will generally be a ratio of two polynomials in \lambda, but since we are interested in the roots, the the denominator can be cleared out and we are left with the quartic numerator polynomial. The original problem thus boils down to solving the quartic equation.
Whether this offers any advantage to the solution based on parameterising the unit circle using a single real variable and solving a nonconvex optimization on an interval, I do not know. Honestly, I started this with the hope that the dependence will perhaps turn out quadratic⊠Anyway, I wanted to share this alternative here. Perhaps somebody can build on top of this.
A = rand(10,2)
b = rand(10)
using LinearAlgebra
using Symbolics
@variables λ
x = (A'*A + λ*I)\(A'*b)
constraint_sym = x'*x-1
Thanks, I didnât realize that trigonometric polynomials (truncated Fourier series) had a companion matrix. I see that there is an article by Boyd (2013) on this.
@longemen3000 I believe there is an issue with your derivation. I did it here on paper and the sign of the d_i z_{1,i} \sin(t) term should be positive. Could you please double check?
I am planning to combine your explicit formula of the quartic polynomial with the insights shared by @zdenek_hurak to write down an explicit solution of the 4 roots with Ferrariâs method. I believe this solution will only require 4 evaluations of the objective, and because the constants are all inner products, we can do everything without array allocations, directly on the stack.
By the way, the approach that I described here, which expresses the solution x as a function of \lambda and then uses the constraint to infer the acceptable values of \lambda is reportedly known as âsecular equationâ, see pages 8 through 10 (or so) in the slides https://people.inf.ethz.ch/gander/talks/Talk2018.pdf. The paper by the numerical computing giant Gene Golub they refer to (and from which they show some screenshosts) is Forsythe, George E., and Gene H. Golub. âOn the Stationary Values of a Second-Degree Polynomial on the Unit Sphereâ. Journal of the Society for Industrial and Applied Mathematics 13, no. 4 (December 1965): 1050â68. https://doi.org/10.1137/0113073.
I did some tests, but apparently the quartic polynomial I am coding has complex roots. Maybe I derived something wrong?
using LinearAlgebra
# quartic polynomial Ax⎠+ BxÂł + CxÂČ + Dx + E = 0 with x = sin(t)
function polynomial(zâ, zâ, d)
a = zâ â zâ
b = zâ â zâ - zâ â zâ
dâ = d â zâ
dâ = d â zâ
A = 4 * a * a + b * b
B = -(4 * a * dâ + 2 * b * dâ)
C = -4 * a * a + dâ * dâ + dâ * dâ - b * b
D = 2 * (a * dâ - b * dâ)
E = a * a + dâ * dâ
A, B, C, D, E
end
Itâs definitely better to use the quartic polynomial. Thatâs an analytical solution, the minimal polynomial is a perfectly good representation of an algebraic number. You can extract its digits with arbitrary precision, do arithmetic, differentiate, etc.
Also, thereâs no need to compute this polynomial by hand, one can just use Gröbner bases to do it for you.
Let L(x,y) = \|z_1x + z_2y-d\|_2^2 + \lambda(x^2 + y^2 - 1) be the Lagrangian of the problem. Taking the partial derivatives gives us
y \langle z_1x + z_2y-d,z_1\rangle = x \langle z_1x + z_2y-d,z_2\rangle \\
x^2 + y^2 - 1 = 0
We now name the variables k = \langle z_1, z_2 \rangle, n_1 = \|z_1\|^2_2, n_2 = \|z_2\|^2_2, w_1 = \langle d, z_1 \rangle, w_2 = \langle d, z_2 \rangle, and feed this into Mathematica (I donât know how to compute Gröbner bases in Julia with symbolic coefficients, if anybody knows Iâd love to hear), asking it to eliminate x. This is the same as computing the Gröbner basis with lexicographical ordering. Mathematica then gives us the quartic polynomial