[ANN] ProximalAlgorithms v0.5.0: documentation and ecosystem update

I’m happy to announce that a ProximalAlgorithms v0.5.0 was released! This new version comes with some breaking changes, and significant code overhaul. Most importantly, it comes with documentation, which was long overdue! A big thanks goes to the contributors and reviewers that helped pushing this release over the finish line in the past few months.

ProximalAlgorithms allows solving a variety of optimization problems where non-differentiable terms occur in the objective. Constrained problems are an example of this, in which the indicator function of the feasible set is included in the objective. Other examples include models in which non-differentiable losses or regularization terms are used.

Here is a summary of the changes that went into this release.

Introducing ProximalCore

The core API for proximal minimization is now defined in the new, extremely lightweight ProximalCore package, which ProximalAlgorithms depends on. This way one can easily define the proximal mapping for custom types, as explained here and have ProximalAlgorithms work fine with them, without the necessity to depend on ProximalOperators.

ProximalOperators v0.15

The latest version of ProximalOperators depends on ProximalCore, and implements the interface there defined: If you wish to use the function types defined in ProximalOperators, don’t forget to update it to the recently released v0.15.0 in order for it to work with the latest ProximalAlgorithms release.

Algorithms, algorithms, algorithms

Significant work went into refactoring the algorithms code, but some new algorithms were also added. The fleet of available algorithms now include:

• Douglas-Rachford splitting
• Three-term splitting
• Primal-dual splitting algorithms
• Newton-type methods (based both on proximal gradient and Douglas-Rachford)

Here is a list of the implemented algorithms, and the types of problems they target.

Here is a quick introduction on the interface to algorithms, with some example.

Zygote as default backend for gradients

While the ProximalCore API allows to explicitly define the gradient of a custom function type, it is only natural to fall back to using the amazing AD ecosystem that Julia has to offer. Therefore, ProximalAlgorithms falls back to Zygote whenever gradients are to be computed. This means that one can provide a regular Julia functions where only gradients are needed, as one would naturally do, with no need to define a custom type like it is needed for proximal mappings instead. This simple example shows this in action.

In the future, it will make sense to add support for multiple AD backends, so that one can easily switch from Zygote to ReverseDiff, Enzyme, Diffractor, Yota, Nabla… whatever works best. If anyone is interested in this, feel free to reach out here or through issues.

13 Likes

I’m not an expert in proximal algorithms, but used ISTA/FISTA before in other languages. These are basically the [accelerated] proximal gradient methods, but don’t see them on the algorithms page Problem types and algorithms · ProximalAlgorithms.jl. Are [F]ISTA really not implemented (maybe less efficient than others), or they are also known and listed by another name?

They are listed as ForwardBackward and FastForwardBackward, respectively, because they correspond to the so-called forward-backward splitting algorithm for solving monotone inclusion problems. They also have aliases as ProximalGradient and FastProximalGradient, which are probably better names, since that’s how they are known in the optimization community.

Edit: I changed the name of the algorithms in the table to “proximal gradient” and “fast proximal gradient” so that this is more apparent. It would probably make sense to also rename functions and types names accordingly.

1 Like

Thanks! I guessed that they must be there by a different name, but didn’t recognize forward-backward splitting.

Small update here: we released a series of bugfix releases (latest is now 0.5.3) with some minor bugfix, and improved performance of some of the algorithms (reduced allocations).

1 Like

this is a really nice package and yields a really nice overview. Thanks for the package. Especially with ProximalCore I will take a closer look whether I can employ than for my projects on Manifolds (you might recognise the name of an algorithm here Douglas–Rachford · Manopt.jl and for Chambolle-Pock · Manopt.jl - ah you even use the same name here as well - nice!).

thanks for this nice package and the documentation!

1 Like

Thank you! The documentation of Manopt is on another league though

Yes, I think the idea of a “core” package makes sense in many cases, probably also for manifold optimization primitives. The main reason is decoupling, since some projections in ProximalOperators started being somewhat “heavy” on dependencies (maybe requiring a QP solver, who knows what in the future).

For now it works fine (the API defined there is really minimal) I just hope it doesn’t end up being a mess to manage!

1 Like

Wait: doesn’t ManifoldBase essentially play the same role as ProximalCore? That is, defining primitives to work with manifolds.

To some extend ManifoldsBase.jl does something similar, yes, it defines primitives / an interface for manifolds, such that you can implement something (form example optimisation algorithms) without the specific manifold in mind or even loaded. Manopt.jl is completely decoupled from Manifolds.jl (but both depend/use the interface - exactly.

It might still be interesting to check whether I can use your core package as well.

By the way - one (though rather simple) proximal algorithm not yet covered in your package is the cyclic proximal point algorithm (for a large sum of functions with easy proximal maps).

1 Like

Would that be eq 9 and 10 from https://web.mit.edu/dimitrib/www/Incremental_Proximal.pdf (with cyclic indices)?

No, even simpler, without the gradient. So if you have f(x) = \displaystyle\sum_{k=1}^n f_i(x) you would just iterate x_k = \operatorname{prox}_{\lambda_k f_{i_k}}(x_k) where \lambda_k is a decreasing (summable but not square summable) sequence and the i_k are as in your reference for example cyclic or random.

Edit: Or short: You do not have the h_i s

Let’s say, the h_i are identically zero, and X = R^n

Edit: yes, proposition 6 shows the convergence with such stepsize choice.

Exactly. The convergence on Hadamard manifolds can be found in https://arxiv.org/pdf/1210.2145.pdf

1 Like

One way I like to implement such an algorithm is: just implement the proximal point algorithm (with variable stepsize) and let the objective be “stochastic” (or cyclic in this case).

It’s like gradient descent. There’s no such a thing as stochastic gradient descent (as an algorithm): there’s gradient descent, and there are stochastic objective functions.

But maybe this is just my personal taste

Interesting, for me the order or stochastic part is part of the algorithm (similarly for gradient descent and its batch/stochastic counterpart) and not of the function / gradient/ proxes involved, so it is indeed a matter of taste or interpretation.