Strange dependence on domain for polar coordinates in optimization over disk

Hi, I’m trying to optimize a function over a disk using a transformation to polar coordinates. When I restrict the radius to non-negative values, there is no problem, but when I allow it to be negative, I get that the algorithm sometimes does not find the optimum. Although I can just restrict to non-negative values for my use-case, I’m trying to understand if there is a deeper, subtle issue here that can bite me in that case. Is the problem the non-differentiability that I’m introducing by taking the absolute value of the radius?

using GLMakie, Optim, ForwardDiff, StaticArrays, LinearAlgebra

from_polar(a::AbstractVector{<:Real}) = abs(a[1]) .* collect(sincospi(a[2]))

optimize_circle(f, r) = from_polar(Optim.optimize(x -> f(from_polar(x)), SVector(-r, -1), SVector(r, 1), [0.5*r, 0.], Fminbox(BFGS(linesearch = Optim.LineSearches.BackTracking(order=3))); autodiff = :forward).minimizer)
optimize_circle2(f, r) = from_polar(Optim.optimize(x -> f(from_polar(x)), SVector(0, -1), SVector(r, 1), [0.5*r, 0.], Fminbox(BFGS(linesearch = Optim.LineSearches.BackTracking(order=3))); autodiff = :forward).minimizer)

function test(N, r)
    x_points = Matrix{Float64}(undef, 2*N + 1, 4)
    for i in 0:(2*N)
        x_point = r .* [cos(pi*i/N), sin(pi*i/N)]
        x_points[i + 1, 1:2] = x_point
        f_test = x -> norm(x .- x_point)
        x_opt = optimize_circle(f_test, r) 
        x_opt2 = optimize_circle2(f_test, r) 
        x_points[i + 1, 3] = f_test(x_opt)
        x_points[i + 1, 4] = f_test(x_opt2)
    end
    f = Figure()
    ax1 = Axis3(f[1, 1], title = "Optimized norm using [-$r,$r] × [-1, 1]")
    scatter!(ax1, x_points[:, 1], x_points[:, 2], x_points[:,3])
    println("Optimizing over [-$r,$r] × [-1, 1] yields a maximum norm of $(maximum(x_points[:, 3]))")
    ax2 = Axis3(f[1, 2], title = "Optimized norm using [0,$r] × [-1, 1]")
    scatter!(ax2, x_points[:, 1], x_points[:, 2], x_points[:,4])
    println("Optimizing over [0,$r] × [-1, 1] yields a maximum norm of $(maximum(x_points[:, 4]))")
    display(GLMakie.Screen(), f)
end
GLMakie.closeall()
test(100, 1.)

gives

Optimizing over [-1.0,1.0] × [-1, 1] yields a maximum norm of 1.0
Optimizing over [0,1.0] × [-1, 1] yields a maximum norm of 3.3333343259534044e-10

What is also strange is that neither method finds the correct spot if I pick some other values for r, such as r = 0.5:

test(100, 0.5)

gives

Optimizing over [-0.5,0.5] × [-1, 1] yields a maximum norm of 0.5
Optimizing over [0,0.5] × [-1, 1] yields a maximum norm of 0.4045084973145854

I think that the problem was using the absolute values. Allowing instead for negative radii (I’ll call this the “double cover” case) seems to solve the problem just fine:

using GLMakie, Optim, ForwardDiff, StaticArrays, LinearAlgebra

from_polar(a::AbstractVector{<:Real}) = abs(a[1]) .* collect(sincospi(a[2]))
from_polar_double(a::AbstractVector{<:Real}) = a[1] .* collect(sincospi(a[2]))

optimize_circle(f, r) = from_polar(Optim.optimize(x -> f(from_polar(x)), SVector(-r, -1), SVector(r, 1), [0.5*r, 0.], Fminbox(BFGS(linesearch = Optim.LineSearches.BackTracking(order=3))); autodiff = :forward).minimizer)
optimize_circle2(f, r) = from_polar(Optim.optimize(x -> f(from_polar(x)), SVector(0, -1), SVector(r, 1), [0.5*r, 0.], Fminbox(BFGS(linesearch = Optim.LineSearches.BackTracking(order=3))); autodiff = :forward).minimizer)
optimize_circle_double(f, r) = from_polar_double(Optim.optimize(x -> f(from_polar_double(x)), SVector(-r, -1), SVector(r, 1), [0.5*r, 0.], Fminbox(BFGS(linesearch = Optim.LineSearches.BackTracking(order=3))); autodiff = :forward).minimizer)

function test(N, r)
    x_points = Matrix{Float64}(undef, 2*N + 1, 5)
    for i in 0:(2*N)
        x_point = r .* [cos(pi*i/N), sin(pi*i/N)]
        x_points[i + 1, 1:2] = x_point
        f_test = x -> norm(x .- x_point)
        x_opt = optimize_circle(f_test, r) 
        x_opt2 = optimize_circle2(f_test, r) 
        x_opt3 = optimize_circle_double(f_test, r)
        x_points[i + 1, 3] = f_test(x_opt)
        x_points[i + 1, 4] = f_test(x_opt2)
        x_points[i + 1, 5] = f_test(x_opt3)
    end
    f = Figure()
    ax1 = Axis3(f[1, 1], title = "Optimized norm using [-$r,$r] × [-1, 1]")
    scatter!(ax1, x_points[:, 1], x_points[:, 2], x_points[:,3])
    println("Optimizing over [-$r,$r] × [-1, 1] yields a maximum norm of $(maximum(x_points[:, 3]))")
    ax2 = Axis3(f[1, 2], title = "Optimized norm using [0,$r] × [-1, 1]")
    scatter!(ax2, x_points[:, 1], x_points[:, 2], x_points[:,4])
    println("Optimizing over [0,$r] × [-1, 1] yields a maximum norm of $(maximum(x_points[:, 4]))")
    ax3 = Axis3(f[1, 3], title = "Optimized norm using [-$r,$r] × [-1, 1]")
    scatter!(ax3, x_points[:, 1], x_points[:, 2], x_points[:,5])
    println("Optimizing over [-$r, $r] × [-1, 1] using double cover yields a maximum norm of $(maximum(x_points[:, 5]))")
    display(GLMakie.Screen(), f)
end
GLMakie.closeall()
test(100, 0.5)
test(100, 1.)
test(100, 2)

gives

Optimizing over [-0.5,0.5] × [-1, 1] yields a maximum norm of 0.5
Optimizing over [0,0.5] × [-1, 1] yields a maximum norm of 0.4045084973145854
Optimizing over [-0.5, 0.5] × [-1, 1] using double cover yields a maximum norm of 1.5677659170876268e-10
Optimizing over [-1.0,1.0] × [-1, 1] yields a maximum norm of 1.0
Optimizing over [0,1.0] × [-1, 1] yields a maximum norm of 3.3333343259534044e-10
Optimizing over [-1.0, 1.0] × [-1, 1] using double cover yields a maximum norm of 3.9689496134087676e-10
Optimizing over [-2,2] × [-1, 1] yields a maximum norm of 2.0
Optimizing over [0,2] × [-1, 1] yields a maximum norm of 2.00000000025004
Optimizing over [-2, 2] × [-1, 1] using double cover yields a maximum norm of 9.771419229309686e-10

I guess my question now is: is this (= the double cover way) the recommended way to proceed, or is there a better way to tackle such problems? What I don’t like too much about the current approach is that it effectively doubles the domain over which we’re looking, which may increase the computational cost.

You are using an algorithm (BFGS) that implicitly assumes that the function is not only differentiable, but twice differentiable, so all bets are off if you give it a non-differentiable function. (In my experience, BFGS algorithms are especially fragile to non-differentiability or errors in the gradient.)

When doing any kind of local optimization, you usually want to try very hard to ensure that your functions are differentiable (even if you are employing algorithms that don’t explicitly require a gradient).

On a separate note: Here, you are constraining \theta \in [-\pi, \pi]. I wouldn’t do this — the problem is that, by bounding \theta, you can get artificially trapped in a local minimum at the boundary.

I would suggest making \theta unbounded (or at least increasing the \theta domain to [-2\pi, +2\pi] and using a starting point \in [-\pi, \pi]). You can always wrap \theta back into whatever range you want at the end.

In local optimization algorithms like BFGS, the size of the domain doesn’t directly affect computational cost. It’s essentially just going (roughly) downhill from your starting point, not doing an exhaustive search of the domain.

Second, if you are optimizing in polar coordinates, I’m not sure why restricting the radius to be nonnegative would be a problem. You can still have smooth descent paths that pass arbitrarily close to the origin while keeping the radius nonnegative. (That is, there should be no artificial local minima created by an r \ge 0 constraint.)

Regardless, you do have to be generally cautious of coordinate singularities at the origin, especially if you aren’t using analytical derivatives (either manually or via AD).