Random question about Differential Evolution: do you happen to know if there’s a variant of DE that keeps all vectors of the population on the unit sphere or on the unit simplex?
My optimization problem is of the form:
\min_x f(x) \quad \text{s.t.} \sum_k x_k^2 = 1
so x should be on the sphere. Thus, I can’t use basic DE because the mutation operation will leave the sphere.
Should I just run the usual DE step (mutation + crossover) and then project the result of the sphere, like in projected gradient descent? I tried it and it kinda works (seems to minimize the objective), but I don’t know if it’s the right way.
I also tried projecting DE’s “donor” points onto the plane tangent to the current x_i, running mutation & crossover on these projections and then projecting/“retracting” the result back onto the sphere. This seems to minimize the objective too, but again I’m not sure if it’s a fluke.
Same with x inside the unit simplex:
\min_x f(x) \quad \text{s.t.} \sum_k x_k = 1, x_k > 0\;\forall k
Are there variants of DE that respect such constraints? I’ve been searching on and off for a couple months, but found literally nothing on this topic.
Sorry, I know it’s off-topic for this thread, but since DE was mentioned, I have to snatch the opportunity
The Manopt.jl
package supports optimization on manifolds (such as the unit sphere) and contains several population-based metaheuristic optimizers such as PSO and CMA-ES.
2 Likes
A simple trick is to just change variables, to
\min_{y \in \mathbb{R}^n} f(y / \Vert y \Vert_2)
Yeah, this is very simple, it’s what I was using initially. But I know that f(x) is convex and thus “easy” to optimize + the optimum is unique, so I don’t want to lose convexity.
Doesn’t the y/\lVert y\rVert_2 transformation make the function non-convex in y and thus harder to optimize? Perhaps it introduces local minima?
Similarly, I can use the f(\operatorname{softmax}(y)) transformation for the unit simplex case. But then again, I think this makes the resulting function non-convex and potentially arbitrarily complex, since Softmax is non-convex?
1 Like
A different simple trick is also just to define:
x_n = 1 - \sum_k^{n-1} x_k^2
and then only optimize x_1, \ldots, x_{n-1}, since by definition any choice of those values is on the manifold.
Yet another way is to optimize within the space of stereographic projection.
These helper functions may be of interest.
In practice, we use nonlinear conjugate gradient optimization for some fixed number of “sub” iterations with a fixed stereographic projection axis. The outer loop updates the projection axis as the current manifold point.
Concerning convexity, if you minimise a function on the sphere, convexity is tricky.
Globally any function that is convex (on all of the sphere) is constant – which is a bit boring optimisation wise, since every point is a minimiser.
I am a bit biased (developing Manopt.jl and Manifolds.jl) and my knowledge on DE algorithms is a bit vague, but if you (a) determine the direction Y you want to “step into” as a vector tangent at the point on the sphere you are at (that is \langle x^{(k)}, Y \rangle = 0 you could use any retraction, or even the exponential map, which informally means to follow the great arc that emanates from the point in direct Y – and you would follow that for the length \lVert Y\rVert.
Then you do not have to think about constraints even 
The solution from Chris might also work but you have to be careful that the values of x_1,...x_{n-1} still have to be constrained.
You forgot the square root on the right-hand side. But with this approach you have the problem of picking a sign, and of constraining \sum_k^{n-1} x_k^2 \le 1; see also Interior-Point Newton tries points outside of the constraint set
The change of variables x = y / \Vert y \Vert_2 does this automatically, since \nabla_y f(y / \Vert y \Vert_2) \perp y.
You have already lost convexity (unless f is constant) because the feasible set is non-convex.
However, if your f(x) has a unique local (= global) optimum on the unit sphere, then f(y / \Vert y \Vert_2) has a unique line of optima y = \alpha x, i.e. a unique value of x = y / \Vert y \Vert_2. Local optimization will find an arbitrary y on that line, but will give the same x.
PS. As an aside, I would try hard to get a gradient of your function, if possible, so that you can use a better algorithm than DE.
4 Likes
Indeed, according to (Boyd & Vandenberghe, section 3.1.1), convexity of a function requires convexity of its domain. The probability simplex \{x : x_k \ge 0, \sum_k x_k = 1\} is convex (looks like a triangle in 3D), but a sphere (as opposed to a ball, for example) is actually not, so did lose convexity by going to the sphere, huh. I totally missed that, even though it’s literally mentioned in the paper [1] I was inspired by. However, their theorem says that if f(x) is convex, then any 2nd-order KKT point z^* of f(z \odot z) with z on the sphere will give the global minimizer x^* = z^* \odot z^*, so I guess it’s not that bad…
I’m using DE because my actual objective function has other unconstrained variables and appears to have many local minima (but it is convex wrt the vector x on the simplex, with other variables fixed; perhaps this convexity doesn’t actually matter because the function isn’t convex in all of its parameters). I’m trying to leverage DE’s ability to explore the loss landscape and maybe find multiple optima simultaneously. Of course, another option is to use gradient-based optimization with multistart, but I feel like it’s slow.
- Li, Qiuwei, Daniel McKenzie, and Wotao Yin. “From the Simplex to the Sphere: Faster Constrained Optimization Using the Hadamard Parametrization.” arXiv, August 30, 2022. [2112.05273] From the simplex to the sphere: Faster constrained optimization using the Hadamard parametrization.
In a moderate number of dimensions, I’ve had good success with the MLSL multistart algorithm (which can be combined with a gradient-based local optimizer). One thing to be careful of with multistart algorithms is to tune the convergence tolerance of the local optimizer — you can often get away with a coarse convergence tolerance, and just go back at the end and “polish” your best optimum by running a local optimizer with a smaller tolerance.
I’ll have a look, thanks!
A simple approach (Similar to others here) is to use the Projected Gradient Descent:
- Gradient Step: \hat{\boldsymbol{x}}^{k + 1} = \boldsymbol{x}^{k} - \mu {\nabla}_{\boldsymbol{x}} f \left( \boldsymbol{x}^{k} \right).
- Projection Step: \boldsymbol{x}^{k + 1} = {P}_{\mathcal{C}} \left( \hat{\boldsymbol{x}}^{k + 1} \right).
In you case the projection is given by: {P}_{\mathcal{C}} \left( \boldsymbol{x} \right) = \frac{\boldsymbol{x}}{ {\left\| \boldsymbol{x} \right\|}_{2} }.
You can easily add Acceleration into the operation (FISTA).
It will converge, though no guarantees about the optimality.
Yet, in the case of \mathcal{C} being the Unit Simplex (A convex set), it will converge to the optimal solution.