Random question about Differential Evolution: do you happen to know if there’s a variant of DE that keeps all vectors of the population on the unit sphere or on the unit simplex?
My optimization problem is of the form:
\min_x f(x) \quad \text{s.t.} \sum_k x_k^2 = 1
so x should be on the sphere. Thus, I can’t use basic DE because the mutation operation will leave the sphere.
Should I just run the usual DE step (mutation + crossover) and then project the result of the sphere, like in projected gradient descent? I tried it and it kinda works (seems to minimize the objective), but I don’t know if it’s the right way.
I also tried projecting DE’s “donor” points onto the plane tangent to the current x_i, running mutation & crossover on these projections and then projecting/“retracting” the result back onto the sphere. This seems to minimize the objective too, but again I’m not sure if it’s a fluke.
The Manopt.jl package supports optimization on manifolds (such as the unit sphere) and contains several population-based metaheuristic optimizers such as PSO and CMA-ES.
Yeah, this is very simple, it’s what I was using initially. But I know that f(x) is convex and thus “easy” to optimize + the optimum is unique, so I don’t want to lose convexity.
Doesn’t the y/\lVert y\rVert_2 transformation make the function non-convex in y and thus harder to optimize? Perhaps it introduces local minima?
Similarly, I can use the f(\operatorname{softmax}(y)) transformation for the unit simplex case. But then again, I think this makes the resulting function non-convex and potentially arbitrarily complex, since Softmax is non-convex?
In practice, we use nonlinear conjugate gradient optimization for some fixed number of “sub” iterations with a fixed stereographic projection axis. The outer loop updates the projection axis as the current manifold point.
Concerning convexity, if you minimise a function on the sphere, convexity is tricky. Globally any function that is convex (on all of the sphere) is constant – which is a bit boring optimisation wise, since every point is a minimiser.
I am a bit biased (developing Manopt.jl and Manifolds.jl) and my knowledge on DE algorithms is a bit vague, but if you (a) determine the direction Y you want to “step into” as a vector tangent at the point on the sphere you are at (that is \langle x^{(k)}, Y \rangle = 0 you could use any retraction, or even the exponential map, which informally means to follow the great arc that emanates from the point in direct Y – and you would follow that for the length \lVert Y\rVert.
Then you do not have to think about constraints even
The solution from Chris might also work but you have to be careful that the values of x_1,...x_{n-1} still have to be constrained.
The change of variables x = y / \Vert y \Vert_2 does this automatically, since \nabla_y f(y / \Vert y \Vert_2) \perp y.
You have already lost convexity (unless f is constant) because the feasible set is non-convex.
However, if your f(x) has a unique local (= global) optimum on the unit sphere, then f(y / \Vert y \Vert_2) has a unique ray of optima y = \alpha x (\alpha > 0), i.e. a unique value of x = y / \Vert y \Vert_2. Local optimization will find an arbitrary y on that line, but will give the same x.
PS. As an aside, I would try hard to get a gradient of your function, if possible, so that you can use a better algorithm than DE.
Indeed, according to (Boyd & Vandenberghe, section 3.1.1), convexity of a function requires convexity of its domain. The probability simplex \{x : x_k \ge 0, \sum_k x_k = 1\} is convex (looks like a triangle in 3D), but a sphere (as opposed to a ball, for example) is actually not, so did lose convexity by going to the sphere, huh. I totally missed that, even though it’s literally mentioned in the paper [1] I was inspired by. However, their theorem says that if f(x) is convex, then any 2nd-order KKT point z^* of f(z \odot z) with z on the sphere will give the global minimizer x^* = z^* \odot z^*, so I guess it’s not that bad…
I’m using DE because my actual objective function has other unconstrained variables and appears to have many local minima (but it is convex wrt the vector x on the simplex, with other variables fixed; perhaps this convexity doesn’t actually matter because the function isn’t convex in all of its parameters). I’m trying to leverage DE’s ability to explore the loss landscape and maybe find multiple optima simultaneously. Of course, another option is to use gradient-based optimization with multistart, but I feel like it’s slow.
In a moderate number of dimensions, I’ve had good success with the MLSL multistart algorithm (which can be combined with a gradient-based local optimizer). One thing to be careful of with multistart algorithms is to tune the convergence tolerance of the local optimizer — you can often get away with a coarse convergence tolerance, and just go back at the end and “polish” your best optimum by running a local optimizer with a smaller tolerance.
In you case the projection is given by: {P}_{\mathcal{C}} \left( \boldsymbol{x} \right) = \frac{\boldsymbol{x}}{ {\left\| \boldsymbol{x} \right\|}_{2} }.
You can easily add Acceleration into the operation (FISTA).
It will converge, though no guarantees about the optimality.
Yet, in the case of \mathcal{C} being the Unit Simplex (A convex set), it will converge to the optimal solution.
Boyd also suggested a simple method to solve these kind of problems, namely Convex Concave Programming (CCP). It is treated in the course EE364A. See also:
and
(unfortunately not available in Julia yet, but I recently rewrote this package)
If needed, you can add slack variables to the right sides and increasingly penalize violations.
Typically you can get to a decent solution within only a few iterations, using any off the shelf convex solver that can handle your f(x). Not as fancy as the other methods here ofcourse, but it does work. DCCP handles the reformulation for you automatically.