Gradient-Free Neural Network Optimization

I have a neural network which is real valued (i.e., maps into \mathbb{R}, with inputs in \mathbb{R}^{n}), but it has complex weights. Consequently, taking gradients of the loss function is complex differentiation, and the lack of analyticity gets in the way (i.e., it’s not necessarily differentiable anyway). I am using Flux, and all of the built-in optimizers appear to be gradient based, and seem to be causing issues with my applications, I suspect for this reason. Are there packages which are either plug-and-play or close to it that implement non-gradient based (e.g., trust region or otherwise) methods, and does anybody have experience with training networks like this?

Using Optimization.jl is probably the easiest way to do this. There are wrappers over many derivative-free methods. NLopt’s NLopt.LN_COBYLA() is one that I’ve had a good amount of success with. See:

https://docs.sciml.ai/Optimization/stable/optimization_packages/nlopt/

I generally only use derivative-free with neural networks as a way to test, but you can definitely do it. It’s not going to be as fast but it’s “fine”.

2 Likes

Thanks, Chris!

Yeah, it doesn’t need to be fast. I am using the neural net as a refinement on a faster approximation method that gives me a substantial “warm start”, so I’m mostly just concerned about making sure I don’t get bad gradients that get me back out of the neighborhood I’m supposed to be in.

Just note that if you do this with Flux, you need to use restructure/destructure. It’s a bit easier to use Lux.jl for this kind of thing, though it’s not hard.

1 Like