Some thoughts and questions on smooth non-linear optimization, degrees of freedom, optimizers, MLPs

BdeKoning · February 14, 2025, 7:02am

Although I’d like to pose my thoughts and questions on a quite general group of optimization problems, my thoughts on these topics came from working on the following optimization problem:

The second lens surface is a 2d spline surface. The problem is to optimize the control points of this surface in order to achieve a desired illumination pattern on the detector screen. The computation of the illumination pattern is done via differentiable ray-tracing. So we have:

A number of variables to optimize given by the z coordinates of the control points
An objective function given by the difference between the current and desired illumination pattern

This defines a smooth non-linear optimization problem and thus an opportunity to apply one of many (!) solvers for such a problem. I’ve made the interesting observation that if you either:

Solve this problem directly with the Adam optimizer;
Solve this problem by defining a densely connected MLP with a few layers of the same width as the number of of control points, giving it a trivial input of 1 and use the output as the control point z coordinates, and optimize the MLP parameters with the Adam optimizer;

the second approach performs much better (in the sense that it achieves smaller loss values). If you squint a bit, given the ‘trivial’ use of the MLP, you could say this NN is just a parameter space transformation part the optimizer.

An obvious difference between these approaches is that the second one optimizes on a higher dimensional parameter space than the first one. But my intuition here is that the second approach ‘sees more opportunities for loss reduction’. This is because a single parameter in a hidden layer of the MLP affects all control points to a greater and lesser extend. So maybe it is helpful to think of each node in a hidden layer to represent a direction in the control point z coordinate space as determined by the parameter values in the following layers, and the plain Adam optimizer approach considers just one direction?

There’s also the question of what type of cleverness different optimizers employ to find the ‘best’ step direction in the parameter space, and how that interacts with the above.

I’d love to hear your thoughts!

Topic		Replies	Views
Nonlinear optimization using MathOptInterface - examples/learning Optimization (Mathematical)	8	1940	April 19, 2019
Optim.jl very slow Optimization (Mathematical) package	12	2459	December 15, 2016
New to Optimization - Guidance Optimization (Mathematical)	5	1922	March 9, 2017
Preconditioner for nonlinear optimization Optimization (Mathematical)	0	236	July 23, 2020
Is there something like ScipyOptimizer in Julia? General Usage question , diffeq	10	1681	July 11, 2019

Some thoughts and questions on smooth non-linear optimization, degrees of freedom, optimizers, MLPs

Related topics