[ANN] OptimKit.jl – A blissfully ignorant Julia package for gradient optimization

OptimKit.jl v0.1.0 was just registered overnight. It is a package for gradient based optimization, and currently supports gradient descent, conjugate gradient and LBFGS.

Why did I create another package for this, rather than contributing to alternatives that are already around? Similar to how KrylovKit.jl relates to IterativeSolvers.jl, I did not find the existing packages like Optim.jl (which is otherwise great) sufficiently flexible for my use cases. In particular, I dislike the assumption that parameters should be captured in some AbstractVector or AbstractArray subtypes (especially for optimization problems where this assumption is even mathematically too restricted, as they can be part of any kind of manifold). Julia allows for extremely generic programming, but unfortunately this is not always reflected in the corresponding packages, I assume because of baked-in tradition from how these methods (optimization methods, iterative methods, …) are implemented and provided in conventional languages.

In particular, OptimKit.jl does not assume that your parameters are some subtype of AbstractArray, it does not assume that your gradients are some subtype of AbstractArray, and it does not assume that everything takes place in Euclidean space. It implements generalized versions of gradient descent, conjugate gradient or LBFGS is are being investigated and formulated in the context of Riemannian optimization.

Is your objective function and gradient being computed by a function fg(x), where your parameters x are a simple vector, that’s great, just do

using OptimKit
optimize(fg, x0, LBFGS())

and your good to go; results should be similar to Optim.jl.

Do you have a complicated model parameterized by a bunch of different variables of different types a, b, c, … which you do not want to wrap in a long vector (note that the parameters can live on a manifold and thus do not even need to exhibit the properties of a vector)? Do you update these parameters in a given tangent direction according to a specific recipe? Is your corresponding gradient direction encoded by one of more objects of custom types? Do you want to specify a specific inner product for these gradients? That’s all great. Just do

x0 = (a0, b0, c0,...)
optimize(fg, x0, algorithm; 
retract = # some method that tells you how to update the parameters in a given direction,
inner = # some method that computes the inner product between two tangent directions at the given point x,
add! = # some method that implements the linear combination of tangent directions
transport! = # some method that transports tangent directions
precondition = # some preconditioner)

The choice for making all of these methods keyword arguments is to easily experiment with different choices. Maybe your parameters x are a simple Vector, and so are your gradients and tangent directions, but you nonetheless want to use a different inner product. Or your parameters are unitary matrices, and you want to experiment with different retraction schemes. No need to redefine methods, or implement different wrapper types with altered method definitions. Just pass along the relevant method via the keyword arguments.

See the README for more info.


Thanks, that looks great! I particularly enjoy your “look ma no types” approach, and your code is super nice and concise (also in KrylovKit).

From an user perspective there are now at least three packages with strongly overlapping functionalities: Optim, Manopt, and OptimKit. Any chance we can get you, @pkofod and @kellertuer in a room together and agree on some code sharing? In particular I know that @pkofod wanted to loosen the type signatures in Optim and simplify some of its codebase, so maybe OptimKit would be a good starting point to start adding some of Optim’s features (like second-order methods).


This looks like a reasonable package, nice! It’s always good to see the optimization community growing and also work on manifolds of course.

How do you plan to provide manifolds for your approach?

Currently, I am working with a student on a LBFGS in Manopt.jl, which currently focusses on non-smooth optimization methods on manifolds, but also includes Nelder Mead and a gradient descent for example. It is based on the ManifoldsBase.jl interface to use the manifolds in Manifolds.jl.

I just took a short look, and what I am not much in favour of is the explicit verbosity->print something style. That’s why Manopt.jl has two different approaches to both Optim and OptimKit

  • one implements a single iteration instead of the whole while loop, this way one decouples the step itself and its surrounding (debug, stopping criterion etc)
  • debug is done by decorating the options and a dispatch
  • the optimise function is called solve in Manopt.jl, but handles the while (and dispatches the inner iteration)

The parameters in Manopt.jl are just an arbitrary struct inheriting from the Options (changing data) while the function and the manifold are stored in a Problem(static data), which follows the idea from manopt and pymanopt.

It would be great to discuss such modelling choices together.

1 Like

Yes it is true that it is nicer to implement the optimization algorithms using an iterator interface. I guess this should not be too hard; it’s just that I did not need for the stuff I was using this for. There is however also a finalize function (keyword argument) which is called after every iteration step, that can be used for custom printing, plotting, additional steps, … . So one does not need to use the built-in verbosity flag.

Note that optimization or numerical linear algebra is not my area of expertise, so I am developing these packages (i.e. OptimKit or KrylovKit) out of necessity because I find it easier than forcing the flexibility I needed out of the existing packages.

As of how to implement manifolds, I think one just needs to specify the correct definitions for retractions, inner product, transport and that’s it. Well, one would also need to manually include the correct tangent space projector in the definition of the gradient in the fg routine. I could imagine that one can easily write a convenience layer over OptimKit.jl, such that for some arbitrary user function fg, one has

function optimize(fg, x0, manifold, algorithm)
    function fgprojected(x)
            f, g = fg(x)
             return f, project(manifold, x, g)
    optimize(fgprojected, x0, algorithm; retract = ..., inner = ..., transport = ...)

In the applications where we are currently using this, we wanted to be able to quickly experiment with different retraction and transport schemes and possibly custom inner products, so that’s why I went for maximal flexibility.

1 Like

Thanks for the explanations, in ManifoldsBase.jl we use AbstractRetractions to distinguish different retractions, AbstractVectorTransports and MetricManitold for different metrics,
so that should also be quite flexible, but you’re right designing that might counter a quick experiment at first.

Concerning the projection, yes that would be one way to go, assuming you have an embedded manifold and a projection available.

1 Like

Was your example meant to say

using OptimKit
optimize(fg, x0, LBFGS())


I know OptimKit.jl from before it was registered, and I admire your work here as well as in KrylovKit.jl. Unfortunately I’ve had to move my own work down on my list of prioritization due to work that pays the bills :slight_smile:

I have not read the code again, but I’m pretty sure I stole a code patterns from you, so maybe we can share abstractions at some point. It does not seem as if you’re too interested in being bound by other packages, and I can 100% respect that if that’s the case.


Good catch :-), I’ve changed the original post. I am certainly open to collaborations, especially since I am far from being an expert on these topics.


how to install the OptimKit

I think JuliaPro has a specific package registry in which only a curated set of packages are available. I have no experience with it; I always use standard Julia.