Efficient optimization of objective with cos and sin terms

Given the following objective:

J(t) = ||z_1 \cos(t) + z_2 \sin(t) - d||_2^2

with t \in [0,2\pi] and z_1,z_2,d \in \mathbb{R}^n. How would you minimize this objective efficiently? I am trying to derive an analytical solution, but am still not sure if it is possible.

If anyone has suggestions, please feel free to share. It is easy to use Optim.jl or any black-box optimization package to solve the problem, but I wonder if we can do better.

I don’t think the task has a nice, analytical solution. Here’s a sketch:

First, let’s substitute t = \arccos(u), where u \in [-1, 1]. Then,

\begin{align}J(t) = \tilde{J}(u) &= \lVert \mathbf{z}_1 \cos(\arccos(u)) + \mathbf{z}_2 \sin(\arccos(u)) - \mathbf{d} \rVert_2^2 \\ &=\lVert \mathbf{z}_1 u + \mathbf{z}_2 \sqrt{1 - u^2} - \mathbf{d} \rVert_2^2 \end{align}.

The norm expands to

\begin{align} \tilde{J}(u) &= \lVert \mathbf{z}_1 u\rVert^2_2 + 2 (u \sqrt{1-u^2}\mathbf{z}_1^{\top} \mathbf{z}_2 - u \mathbf{z}_1^\top \mathbf{d} - \sqrt{1-u^2}\mathbf{z}_2^\top \mathbf{d}) + \lVert \mathbf{z}_2\sqrt{1-u^2} \rVert_2^2 + \lVert \mathbf{d} \rVert_2^2 \\ &= u^2 (\lVert \mathbf{z}_1\rVert^2_2 - \lVert \mathbf{z}_2\rVert^2_2) + 2u \sqrt{1-u^2}(\mathbf{z}_1^{\top} \mathbf{z}_2) - 2u (\mathbf{z}_1^\top \mathbf{d}) - 2\sqrt{1-u^2}(\mathbf{z}_2^\top \mathbf{d}) + \text{const.} \end{align}

Denoting

\begin{align} \alpha &= \lVert \mathbf{z}_1\rVert^2_2 - \lVert \mathbf{z}_2\rVert^2_2, \\ \beta &= \mathbf{z}_1^{\top} \mathbf{z}_2, \\ \gamma &= \mathbf{z}_1^\top \mathbf{d}, \\ \delta &= \mathbf{z}_2^\top \mathbf{d}, \end{align}

we minimize

\tilde{J}(u) = \alpha u^2 + 2\beta u\sqrt{1-u^2} - 2\gamma u - 2 \delta \sqrt{1-u^2} + \text{const.}

The function is not convex in general; it can have up to three extrema, two of which are local minima and one a local maximum, I think. You might derive an analytical solution by finding the roots of the first derivative, but the solution is likely going to be ugly. So you would probably go for a numerical solution anyway. I think that, e.g., Newton’s method would converge fast enough (with initialization like u_0 = \pm 1).

Maybe this specific (more informed) approach is better than a general method from Optim.jl in some aspects, but I have no idea really.

3 Likes

Black box might have the problem, that it is defined on the Circle and hence intrinsically non-convex, you could also use a gradient descent or newton method on the circle though.

With Box constraints you might be lucky enough to choose the right constraints ([0,2\pi] or [-\pi,\pi] or any other shift is possible, they all might yield different local minima), compared to that you do not have to choose that on the circle – and then using Particle Swarm (maybe even NelderMead) or LTMADS might be nice to find a global minimum still.

1 Like

It’s periodic in t, so any such interval is equivalent.

I think you’re over-thinking this. Yes, it’s non-convex. Yes, it might have multiple local minima. But it’s only a function of a single variable, and it’s not that oscillatory, so you can easily brute-force it.

Nelder–Mead is only local minimization (and is an outdated algorithm that is not even guaranteed to converge to a global minimum). Particle swarm, like genetic algorithms, is one of those algorithms that I personally think gets disproportionate attention because it has a nice story/picture, as opposed to algorithms that just has math. LTMADS is more for handling constraints.

For a simple box-constrained problem in one variable I would probably just use DIRECT or maybe MLSL.

6 Likes

For example, with

using Plots, NLopt, ForwardDiff
J(t, z₁, z₂, d) = abs2(z₁*cos(t) + z₂*sin(t) - d)

I picked some random values for the parameters:

@show z₁, z₂, d = randn(3)
plot(t/2π, J.(t, z₁, z₂, d), label="J(t)", xlabel="t/2π", ylabel="J")

giving

(z₁, z₂, d) = randn(3) = [-1.5755713126457926, 1.0035894555331297, -0.17311602012804714]

and a function with a couple of local minima:


To minimize it, I did two passes using NLopt.jl: first, a global search using the DIRECT-L algorithm, which is essentially a slightly clever brute-force search, and told it to find the minimum to within 5% of 2π. This takes only 21 iterations 
 honestly, you might as well use a grid search for this. Then I used this as a starting point for a gradient-based local search, which converged to 6 digits in 5 iterations:

function J_objective(x, grad, z₁, z₂, d)
    t = x[1]
    if !isempty(grad)
        grad[1] = ForwardDiff.derivative(t -> J(t, z₁, z₂, d), t)
    end
    return J(t, z₁, z₂, d)
end

global_opt = NLopt.Opt(:GN_DIRECT_L, 1)
NLopt.lower_bounds!(global_opt, [0.0])
NLopt.upper_bounds!(global_opt, [2π])
NLopt.xtol_abs!(global_opt, 2π * 0.05)
NLopt.min_objective!(global_opt, (x, grad) -> J_objective(x, grad, z₁, z₂, d))
@show min_f1, min_x1, ret1 = NLopt.optimize(global_opt, [1π])
@show NLopt.numevals(global_opt)

local_opt = NLopt.Opt(:LD_LBFGS, 1)
NLopt.lower_bounds!(local_opt, [0.0])
NLopt.upper_bounds!(local_opt, [2π])
NLopt.xtol_abs!(local_opt, 1e-6)
NLopt.min_objective!(local_opt, (x, grad) -> J_objective(x, grad, z₁, z₂, d))
@show min_f, min_x, ret = NLopt.optimize(local_opt, min_x1)
@show NLopt.numevals(local_opt)

yielding:

(min_f1, min_x1, ret1) = NLopt.optimize(global_opt, [1π]) = (0.00017364873328543448, [0.9179139029007216], :XTOL_REACHED)
NLopt.numevals(global_opt) = 21
(min_f, min_x, ret) = NLopt.optimize(local_opt, min_x1) = (7.55041575397572e-30, [0.9108315038455206], :SUCCESS)
NLopt.numevals(local_opt) = 5

The results corresponds to t/2π = min_x[1] / 2π of 0.14496333616083928, which is the first minimum on the plot above. The other minimum is actually equally good, so it is somewhat random which one it finds.

Alternatively, since this is a smooth function of 1 variable, you can fit it to a polynomial to machine precision using Chebyshev approximation, which only takes about 30 function evaluations. Then you can find all the extrema of the polynomial by taking the derivative and finding the roots via a companion-matrix method. ApproxFun.jl does all of this for you, and in fact can do it in a Fourier-series basis that takes the periodicity into account:

using ApproxFun
Jf = Fun(t -> J(t, z₁, z₂, d), Fourier())
r = roots(Jf')
@show r[findmin(Jf.(r))[2]]

which gives

r[(findmin(Jf.(r)))[2]] = 4.238034418629369

which corresponds to t/2π, the second local minimum in the plot above. The nice thing about this is that it gives you the root to machine precision, and handles all of the details for you.

PS. I’m actually not sure what method ApproxFun uses to find roots of a Fourier series. @dlfivefifty?

PPS. You could even just do a grid search and then polish the root with a couple of Newton steps on the gradient, no fancy libraries required.

PPPS. If you really need to do this blindingly fast, so that the above aren’t fast enough for you, this is a simple enough function that I suspect you could do some more analysis to narrow things down, e.g. to prove things about the kinds of local minima it might have — experimentally it always seems to have one local minimum or two equivalent local minima, in which case a single local search would suffice. But my basic point is that this particular problem is trivial to brute-force, so it might not be worth a lot of head-scratching unless you really care about it.

10 Likes

Another approach I was thinking to use is interval optimization as it should be able to identify all the minima as opposed to just one.

1 Like

For some reason IntervalOptimisation.jl is failing to return a reduced set of minima:

julia> J(t) = sum(abs2, z1 * cos(t) + z2 * sin(t) - d)
J (generic function with 1 method)

julia> z1 = rand(3)
3-element Vector{Float64}:
 0.09619401196573985
 0.6093934853421249
 0.3455165955548245

julia> z2 = rand(3)
3-element Vector{Float64}:
 0.5056285059545951
 0.4061191884336536
 0.7118690622027726

julia> d = rand(3)
3-element Vector{Float64}:
 0.26683120670297666
 0.28572325462312265
 0.33648095469233374

julia> J(1)
0.3945903894962912

julia> using IntervalOptimisation

julia> using IntervalArithmetic

julia> Jmin, ts = minimize(J, 0..2π)
([0.0792811, 0.0795003], Interval{Float64}[[2.02874, 2.02955], [2.02796, 2.02875], [2.02718, 2.02797], [2.02562, 2.02641], [2.0264, 2.02719], [2.02484, 2.02563], [2.02407, 2.02485], [2.03027, 2.03102], [2.02954, 2.03028], [2.03101, 2.03175]  
  [2.03923, 2.04], [2.01562, 2.0164], [2.03999, 2.04076], [2.01485, 2.01563], [2.04075, 2.04154], [2.01409, 2.01486], [2.04153, 2.04227], [2.01331, 2.0141], [2.04226, 2.04302], [2.01255, 2.01332]])

julia> ts
40-element Vector{Interval{Float64}}:
 [2.02874, 2.02955]
 [2.02796, 2.02875]
 [2.02718, 2.02797]
 [2.02562, 2.02641]
 [2.0264, 2.02719]
 [2.02484, 2.02563]
 [2.02407, 2.02485]
 [2.03027, 2.03102]
 [2.02954, 2.03028]
 [2.03101, 2.03175]
 [2.02253, 2.02332]
 [2.02331, 2.02408]
 [2.03174, 2.0325]
  ⋼
 [2.01639, 2.01719]
 [2.03848, 2.03924]
 [2.03923, 2.04]
 [2.01562, 2.0164]
 [2.03999, 2.04076]
 [2.01485, 2.01563]
 [2.04075, 2.04154]
 [2.01409, 2.01486]
 [2.04153, 2.04227]
 [2.01331, 2.0141]
 [2.04226, 2.04302]
 [2.01255, 2.01332]

You are right. Probably because I usually do not work in single variables.
Concerning Nelder–Mead and LTMADS – also there, I have to confess I am far behind your knowledge on those, for example I have no clue about DIRECT or MLSL.
Sorry, will not do that again.

Ok, lets take a stab at this:

f(t) = \left\| x \right\|^{2}_{2} = \sum{x_i^2} = \sum{(z_{1,i}cos(t) + z_{2,i}sin(t) - d_{i})^2}

at the minimum, \frac{\partial f}{\partial t} = 0 :

\frac{\partial f}{\partial t} = 0 = \frac{\partial}{\partial t}\sum{(z_{1,i}cos(t) + z_{2,i}sin(t) - d_{i})^2} 0 = \sum{(\frac{\partial}{\partial t}(z_{1,i}cos(t) + z_{2,i}sin(t) - d_{i})^2)} \\ 0 = \sum{((-z_{1,i}^2 + z_{2,i}^2)sin(t)cos(t) + z_{1,i}z_{2,i}cos(t)cos(t) - z_{1,i}z_{2,i}sin(t)sin(t) - d_{i}z_{1,i}sin(t) - d_{i}z_{2,i}cos(t)}) \\

if we define:

k_1 = \sum(z_{1,i}^2 + z_{2,i}^2) \\ k_2 = \sum(z_{1,i}z_{2,i}) = \langle z_{1,i},z_{2,i} \rangle \\ w_1 = \langle d_{i},z_{1,i} \rangle \\ w_2 = \langle d_{i},z_{2,i} \rangle \\ x = sin(t), y = cos(t) \\

Then, the function can be solved as the following system of equations:

0 = k_1 xy + k_2y^2 - k_2x^2 - w_1x -w_2y \\ 0 = x^2 + y^2 - 1

we can obtain x = f(y) with some substitution:

x = \frac{2k_2y^2 - w_2y - k_2}{w_1 - k_1y}

and then replace x in the second constraint:

\left( \frac{2k_2y^2 - w_2y - k_2}{w_1 - k_1y} \right)^2 + y^2 - 1 = 0

that is a quartic polynomial in y (there are formulas to obtain the analytic solution for this problem, or you can use a polynomial solver). you can find then find all the (real) solutions to the polynomial, the values of y that are not in [-1,1] and find which one satisfies the minimum condition.

if there is something wrong, let me know, i was solving this as i wrote it, but if there is a solution, it involves obtaining a polynomial system in $x, $y with coefficients that are reductions of the input vectors.

4 Likes

I think there’s a bug here. Refining the tolerance (keyword argument tol) results in even more minimizer intervals, most of which overlap. @dpsanders perhaps you’re interested.

1 Like

I will add here a reformulation for your problem, although I guess that you, @juliohm, are not only aware of it but also that this was actually the original problem that you reformulated into the single-variable problem. Here it goes:

\begin{aligned} \operatorname*{minimize}_{x\in\mathbb R^2} &\quad \|Ax-b\|_2\\ \text{subject to} &\quad \|x\| = 1, \end{aligned}

where A\in\mathbb R^{n\times 2}, and b \in \mathbb R^n.

It can be viewed as an instance of a quadratically constrained quadratic program (QCQP), obviously an unconvex one.

One way – and perhaps the best one – is by direct parameterisation of the feasible set (the unit circle) using a single real variable (the angle).

Another way could be the following. Upon introducing a Lagrange multiplier \lambda\in\mathbb R, we can write the necessary conditions of optimality as

\begin{aligned} (A^\top A + \lambda I) x &= A^\top b \\ x^\top x - 1 &= 0. \end{aligned}

The matrix defining the first equation is 2\times 2, hence a formula can be written for its solution as a function of \lambda

x(\lambda) = (A^\top A + \lambda I)^{-1}A^\top b.

We can then substitute into the left hand side of the second equation

e(\lambda) := x(\lambda)^\top x(\lambda) - 1.

What we are after are the roots of this function. The function will generally be a ratio of two polynomials in \lambda, but since we are interested in the roots, the the denominator can be cleared out and we are left with the quartic numerator polynomial. The original problem thus boils down to solving the quartic equation.

Whether this offers any advantage to the solution based on parameterising the unit circle using a single real variable and solving a nonconvex optimization on an interval, I do not know. Honestly, I started this with the hope that the dependence will perhaps turn out quadratic
 Anyway, I wanted to share this alternative here. Perhaps somebody can build on top of this.

A = rand(10,2)
b = rand(10)
using LinearAlgebra
using Symbolics
@variables λ
x = (A'*A + λ*I)\(A'*b)
constraint_sym = x'*x-1
julia> constraint_sym= simplify(constraint_sym)
(-2.1496826938716005 - 20.318711501889148λ - 33.3470944638764(λ^2) - 14.489160786937937(λ^3) - (λ^4)) / ((2.8983709136585443 + 7.2445803934689685λ + λ^2)^2)

Below I do not aim at solving the quartic equation, instead I only plot the function.

constraint_expr = build_function(constraint_sym, λ)
constraint_fun = eval(constraint_expr)

lambda = range(-15,10,length=2000)
plot(lambda,constraint_fun.(lambda),xlabel="λ",ylabel="x(λ)'x(λ)-1",ylims=(-1,1))

5 Likes

It computes eigenvalues of the companion matrix.

For Chebyshev expansions it subdivides the interval to control the polynomial order and computes eigenvalues of the colleague matrix.

1 Like

Thanks, I didn’t realize that trigonometric polynomials (truncated Fourier series) had a companion matrix. I see that there is an article by Boyd (2013) on this.

2 Likes

Roots of

f(z) = c_{-m}z^{-m} + 
 + c_m z^m

are also roots of the polynomial

z^m f(z) = c_{-m} + 
 + c_{2m} z^{2m}

Oh, duh, I wasn’t thinking about z = e^{i\phi}.

In the OP’s case, it’s therefore straightforwardly a low-degree polynomial in x = e^{it}.

Thank you all for the super helpful insights.

@longemen3000 I believe there is an issue with your derivation. I did it here on paper and the sign of the d_i z_{1,i} \sin(t) term should be positive. Could you please double check?

I am planning to combine your explicit formula of the quartic polynomial with the insights shared by @zdenek_hurak to write down an explicit solution of the 4 roots with Ferrari’s method. I believe this solution will only require 4 evaluations of the objective, and because the constants are all inner products, we can do everything without array allocations, directly on the stack.

2 Likes

By the way, the approach that I described here, which expresses the solution x as a function of \lambda and then uses the constraint to infer the acceptable values of \lambda is reportedly known as “secular equation”, see pages 8 through 10 (or so) in the slides https://people.inf.ethz.ch/gander/talks/Talk2018.pdf. The paper by the numerical computing giant Gene Golub they refer to (and from which they show some screenshosts) is Forsythe, George E., and Gene H. Golub. ‘On the Stationary Values of a Second-Degree Polynomial on the Unit Sphere’. Journal of the Society for Industrial and Applied Mathematics 13, no. 4 (December 1965): 1050–68. https://doi.org/10.1137/0113073.

2 Likes

I did some tests, but apparently the quartic polynomial I am coding has complex roots. Maybe I derived something wrong?

using LinearAlgebra

# quartic polynomial Ax⁎ + BxÂł + CxÂČ + Dx + E = 0 with x = sin(t)
function polynomial(z₁, z₂, d)
  a = z₁ ⋅ z₂
  b = z₂ ⋅ z₂ - z₁ ⋅ z₁
  d₁ = d ⋅ z₁
  d₂ = d ⋅ z₂
  A = 4 * a * a + b * b
  B = -(4 * a * d₁ + 2 * b * d₂)
  C = -4 * a * a + d₁ * d₁ + d₂ * d₂ - b * b
  D = 2 * (a * d₁ - b * d₂)
  E = a * a + d₂ * d₂
  A, B, C, D, E
end

Here is a small test:

julia> z1 = rand(3)
3-element Vector{Float64}:
 0.021553008480093894
 0.5376179480003928
 0.056695819740902764

julia> z2 = rand(3)
3-element Vector{Float64}:
 0.20214667066080527
 0.36051053300345526
 0.4475695503545505

julia> d = rand(3)
3-element Vector{Float64}:
 0.6774110630795951
 0.9644071492859374
 0.7483034986631949

julia> A, B, C, D, E = polynomial(z1, z2, d)
(0.2060493039312763, -0.6431821630318287, 0.7967953963082095, 0.12874439054909456, 0.7216088554970678)

Computing the roots with Polynomials.jl:

julia> using Polynomials

julia> roots(Polynomial(reverse([A,B,C,D,E])))
4-element Vector{ComplexF64}:
 -0.28659638301251533 - 0.7584459715882509im
 -0.28659638301251533 + 0.7584459715882509im
   1.8473445892921885 - 1.383735013586352im
   1.8473445892921885 + 1.383735013586352im

Computing with the lightweight AMRVW.jl:

julia> using AMRVW

julia> AMRVW.roots(reverse([A,B,C,D,E]))
4-element Vector{ComplexF64}:
 -0.2865963830125152 - 0.7584459715882509im
 -0.2865963830125152 + 0.7584459715882509im
  1.8473445892921854 - 1.3837350135863542im
  1.8473445892921854 + 1.3837350135863542im

Before I try to debug my local implementation of the Ferrari’s method, it would be nice to understand what is happening.

The derivation adjusted from @longemen3000 's derivation above:

\frac{\partial f}{\partial t}(t) = 2\sum_i [z_{1,i} \cos(t) + z_{2,i} \sin(t) - d_i][-z_{1,i}\sin(t) + z_{2,i}\cos(t)]

Expanding the terms and equating to zero:

\sum_i z_{1,i} z_{1,i} (-\sin(t) \cos(t)) + z_{1,i} z_{2,i} (\cos(t)^2) - z_{1,i} z_{2,i}(\sin(t)^2) + z_{2,i} z_{2,i} (\sin(t) \cos(t)) + d_i z_{1,i} \sin(t) - d_i z_{2,i} \cos(t) = 0

and in terms of inner products:

(z_2 \cdot z_2 - z_1 \cdot z_1)\sin(t)\cos(t) + z_1 \cdot z_2 \cos(t)^2 - z_1 \cdot z_2 \sin(t)^2 + d \cdot z_1 \sin(t) - d \cdot z_2 \cos(t) = 0

We set the constants

a = z_1 \cdot z_2, b = z_2 \cdot z_2 - z_1 \cdot z_1, d_1 = d \cdot z_1 and d_2 = d \cdot z_2

and introduce the variables x = \cos(t) and y = \sin(t) to get

bxy + ax^2 - ay^2 + d_1 y - d_2 x = 0

We can isolate x by adding and subtracting ay^2, knowing that x^2 + y^2 = 1, and assuming that by - d_2 \ne 0:

x = \frac{a + d_1 y - 2ay^2}{by - d_2}

Plugging this value back into x^2 + y^2 - 1 = 0, we obtain:

\left(\frac{a + d_1 y - 2ay^2}{by - d_2}\right)^2 + y^2 - 1 = 0

If we multiply all terms by the denominator of the first, we get a quartic polynomial Ay^4 + By^3 + Cy^2 + Dy + E = 0 with coefficients

A = 4a^2 + b^2
B = -4ad_1 - 2bd_2
C = -4a^2 + d_1^2 + d_2^2 - b^2
D = 2ad_1 - 2bd_2
E = a^2 + d_2^2
1 Like

It’s definitely better to use the quartic polynomial. That’s an analytical solution, the minimal polynomial is a perfectly good representation of an algebraic number. You can extract its digits with arbitrary precision, do arithmetic, differentiate, etc.

Also, there’s no need to compute this polynomial by hand, one can just use Gröbner bases to do it for you.

Let L(x,y) = \|z_1x + z_2y-d\|_2^2 + \lambda(x^2 + y^2 - 1) be the Lagrangian of the problem. Taking the partial derivatives gives us

\frac{\partial}{\partial x} L(x,y) = 2 \langle z_1x + z_2y-d,z_1\rangle + 2\lambda x \\ \frac{\partial}{\partial y} L(x,y) = 2 \langle z_1x + z_2y-d,z_2\rangle + 2\lambda y \\ \frac{\partial}{\partial \lambda} L(x,y) = x^2 + y^2 - 1

Which imply

y \langle z_1x + z_2y-d,z_1\rangle = x \langle z_1x + z_2y-d,z_2\rangle \\ x^2 + y^2 - 1 = 0

We now name the variables k = \langle z_1, z_2 \rangle, n_1 = \|z_1\|^2_2, n_2 = \|z_2\|^2_2, w_1 = \langle d, z_1 \rangle, w_2 = \langle d, z_2 \rangle, and feed this into Mathematica (I don’t know how to compute Gröbner bases in Julia with symbolic coefficients, if anybody knows I’d love to hear), asking it to eliminate x. This is the same as computing the Gröbner basis with lexicographical ordering. Mathematica then gives us the quartic polynomial

f(y) = (4 k^2 + n_1^2 - 2 n_1 n_2 + n_2^2) y^4 + (-4 k w_1 + 2 n_1 w_2 - 2 n_2 w_2) y^3 \\ (-4 k^2 - n_1^2 + 2 n_1 n_2 - n_2^2 + w_1^2 + w_2^2) y^2 + (2 k w_1 - 2 n_1 w_2 + 2 n_2 w_2) y + k^2 - w_2^2

Take its real roots and you’re good to go:

using Polynomials, LinearAlgebra

function analytical(z1, z2, d)
    k = dot(z1, z2)
    n1 = norm(z1)^2
    n2 = norm(z2)^2
    w1 = dot(d, z1)
    w2 = dot(d, z2)
    c0 = k^2 - w2^2
    c1 = 2k * w1 - 2n1 * w2 + 2n2 * w2
    c2 = -4k^2 - n1^2 + 2n1 * n2 - n2^2 + w1^2 + w2^2
    c3 = -4k * w1 + 2n1 * w2 - 2n2 * w2
    c4 = 4k^2 + n1^2 - 2n1 * n2 + n2^2
    p = Polynomial([c0, c1, c2, c3, c4])
    return roots(p)
end
3 Likes