Optimization on unit sphere?

ForceBru · July 14, 2025, 4:29pm

Random question about Differential Evolution: do you happen to know if there’s a variant of DE that keeps all vectors of the population on the unit sphere or on the unit simplex?

My optimization problem is of the form:

\min_x f(x) \quad \text{s.t.} \sum_k x_k^2 = 1

so x should be on the sphere. Thus, I can’t use basic DE because the mutation operation will leave the sphere.

Should I just run the usual DE step (mutation + crossover) and then project the result of the sphere, like in projected gradient descent? I tried it and it kinda works (seems to minimize the objective), but I don’t know if it’s the right way.

I also tried projecting DE’s “donor” points onto the plane tangent to the current x_i, running mutation & crossover on these projections and then projecting/“retracting” the result back onto the sphere. This seems to minimize the objective too, but again I’m not sure if it’s a fluke.

Same with x inside the unit simplex:

\min_x f(x) \quad \text{s.t.} \sum_k x_k = 1, x_k > 0\;\forall k

Are there variants of DE that respect such constraints? I’ve been searching on and off for a couple months, but found literally nothing on this topic.

Sorry, I know it’s off-topic for this thread, but since DE was mentioned, I have to snatch the opportunity

PeterSimon · July 14, 2025, 4:50pm

The Manopt.jl package supports optimization on manifolds (such as the unit sphere) and contains several population-based metaheuristic optimizers such as PSO and CMA-ES.

stevengj · July 14, 2025, 8:10pm

A simple trick is to just change variables, to

\min_{y \in \mathbb{R}^n} f(y / \Vert y \Vert_2)

ForceBru · July 14, 2025, 8:56pm

Yeah, this is very simple, it’s what I was using initially. But I know that f(x) is convex and thus “easy” to optimize + the optimum is unique, so I don’t want to lose convexity.

Doesn’t the y/\lVert y\rVert_2 transformation make the function non-convex in y and thus harder to optimize? Perhaps it introduces local minima?

Similarly, I can use the f(\operatorname{softmax}(y)) transformation for the unit simplex case. But then again, I think this makes the resulting function non-convex and potentially arbitrarily complex, since Softmax is non-convex?

ChrisRackauckas · July 14, 2025, 10:23pm

A different simple trick is also just to define:

x_n = 1 - \sum_k^{n-1} x_k^2

and then only optimize x_1, \ldots, x_{n-1}, since by definition any choice of those values is on the manifold.

kbarros · July 14, 2025, 10:38pm

Yet another way is to optimize within the space of stereographic projection.

These helper functions may be of interest.

In practice, we use nonlinear conjugate gradient optimization for some fixed number of “sub” iterations with a fixed stereographic projection axis. The outer loop updates the projection axis as the current manifold point.

kellertuer · July 15, 2025, 10:14am

Concerning convexity, if you minimise a function on the sphere, convexity is tricky.
Globally any function that is convex (on all of the sphere) is constant – which is a bit boring optimisation wise, since every point is a minimiser.

I am a bit biased (developing Manopt.jl and Manifolds.jl) and my knowledge on DE algorithms is a bit vague, but if you (a) determine the direction Y you want to “step into” as a vector tangent at the point on the sphere you are at (that is \langle x^{(k)}, Y \rangle = 0 you could use any retraction, or even the exponential map, which informally means to follow the great arc that emanates from the point in direct Y – and you would follow that for the length \lVert Y\rVert.
Then you do not have to think about constraints even

The solution from Chris might also work but you have to be careful that the values of x_1,...x_{n-1} still have to be constrained.

stevengj · July 15, 2025, 3:04pm

You forgot the square root on the right-hand side. But with this approach you have the problem of picking a sign, and of constraining \sum_k^{n-1} x_k^2 \le 1; see also Interior-Point Newton tries points outside of the constraint set

The change of variables x = y / \Vert y \Vert_2 does this automatically, since \nabla_y f(y / \Vert y \Vert_2) \perp y.

You have already lost convexity (unless f is constant) because the feasible set is non-convex.

However, if your f(x) has a unique local (= global) optimum on the unit sphere, then f(y / \Vert y \Vert_2) has a unique line of optima y = \alpha x, i.e. a unique value of x = y / \Vert y \Vert_2. Local optimization will find an arbitrary y on that line, but will give the same x.

PS. As an aside, I would try hard to get a gradient of your function, if possible, so that you can use a better algorithm than DE.

ForceBru · July 15, 2025, 7:46pm

Indeed, according to (Boyd & Vandenberghe, section 3.1.1), convexity of a function requires convexity of its domain. The probability simplex \{x : x_k \ge 0, \sum_k x_k = 1\} is convex (looks like a triangle in 3D), but a sphere (as opposed to a ball, for example) is actually not, so did lose convexity by going to the sphere, huh. I totally missed that, even though it’s literally mentioned in the paper [1] I was inspired by. However, their theorem says that if f(x) is convex, then any 2nd-order KKT point z^* of f(z \odot z) with z on the sphere will give the global minimizer x^* = z^* \odot z^*, so I guess it’s not that bad…

I’m using DE because my actual objective function has other unconstrained variables and appears to have many local minima (but it is convex wrt the vector x on the simplex, with other variables fixed; perhaps this convexity doesn’t actually matter because the function isn’t convex in all of its parameters). I’m trying to leverage DE’s ability to explore the loss landscape and maybe find multiple optima simultaneously. Of course, another option is to use gradient-based optimization with multistart, but I feel like it’s slow.

Li, Qiuwei, Daniel McKenzie, and Wotao Yin. “From the Simplex to the Sphere: Faster Constrained Optimization Using the Hadamard Parametrization.” arXiv, August 30, 2022. [2112.05273] From the simplex to the sphere: Faster constrained optimization using the Hadamard parametrization.

stevengj · July 15, 2025, 8:46pm

In a moderate number of dimensions, I’ve had good success with the MLSL multistart algorithm (which can be combined with a gradient-based local optimizer). One thing to be careful of with multistart algorithms is to tune the convergence tolerance of the local optimizer — you can often get away with a coarse convergence tolerance, and just go back at the end and “polish” your best optimum by running a local optimizer with a smaller tolerance.

ForceBru · July 17, 2025, 7:20am

I’ll have a look, thanks!

RoyiAvital · July 19, 2025, 7:40am

A simple approach (Similar to others here) is to use the Projected Gradient Descent:

Gradient Step: \hat{\boldsymbol{x}}^{k + 1} = \boldsymbol{x}^{k} - \mu {\nabla}_{\boldsymbol{x}} f \left( \boldsymbol{x}^{k} \right).
Projection Step: \boldsymbol{x}^{k + 1} = {P}_{\mathcal{C}} \left( \hat{\boldsymbol{x}}^{k + 1} \right).

In you case the projection is given by: {P}_{\mathcal{C}} \left( \boldsymbol{x} \right) = \frac{\boldsymbol{x}}{ {\left\| \boldsymbol{x} \right\|}_{2} }.

You can easily add Acceleration into the operation (FISTA).

It will converge, though no guarantees about the optimality.
Yet, in the case of \mathcal{C} being the Unit Simplex (A convex set), it will converge to the optimal solution.

Topic		Replies	Views
Minimize f(x) with \|x\|=1 Optimization (Mathematical) optimization , constraint	5	298	December 1, 2024
Manifold optimization with additional constraints Optimization (Mathematical) optimization	1	369	June 13, 2021
How to best model different domains in Optimization, especially Manifolds? Optimization (Mathematical)	40	764	June 21, 2023
Nonconvex.jl reproduces the initial vector as the optimized vector Optimization (Mathematical) optimization	33	838	January 8, 2024
Need help on Manopt General Usage optimization , manifolds , manopt	13	436	June 30, 2023

Optimization on unit sphere?

Related topics