A community convention for target function return values

In many packages, we deal with target functions, either in linear or log-space, that will then be the object of optimization, statistical density sampling (esp. in Bayesian statistics, etc.), machine-learning, etc.

Currently, packages have different conventions/API for users to implement such target functions - often just a simple functions returning a value (trusting the user to know whether that should be a lin or log value), or if there’s a gradient, returning a tuple of value and gradient is also used commonly, but not universally.

In particular, when using plain values, there’s always the danger that a less-experienced user may return a linear value instead of the log-density value that an density sampler expects, etc.

I’d like to propose a very simple convention that packages could support (in addition to their own way of doing things) to let the users/packages express their intentions in a common way, without taking on any dependencies, by using the power of NamedTuples.

This isn’t really special in any way in itself - but it might prove quite powerful if it would become a convention that many packages support.

Proposal: Target functions return a NamedTuple

(linval = ...,)

or

(logval = ...,)

to indicate whether the result is in lin- or log-space. If the function computes a custom gradient (no auto-diff), it returns

(linval = ..., grad_linval = [...])

or

(logval = ..., grad_logval = [...])

This is up for discussion, of course - it can be, e.g., logval, logd (for log-density), logresult, etc. - should just be short and clear and a convention we decide to adopt.

The algorithm evaluating the target function can then either complain (“need to give me the log of the density here!” or convert as appropriate).

This would also be implicitly compatible with the (value, gradient) tuple that is commonly used in many packages, if they do something like value, gradient = result internally.

The nice thing about using NamedTuples would be that it’s easily extensible. For example, in expensive calculations (e.g. complex likelihood functions, etc.), a user/application may often need to keep track of intermediate/auxiliary results - sometimes for debugging and verification, sometimes also as an important part of later analysis.

So for a use case like, e.g. MCMC sampling, we would allow the target function to return

(logval = ..., aux = (my_intermediate_sum = ..., some_other_value = ...))
(logval = ..., grad_logval = [...], aux = (my_intermediate_sum = ..., some_other_value = ...))

or maybe even something like (no requirement to use aux explicitly)

(logval = ..., my_intermediate_sum = ..., some_other_value = ...)

The sampler, optimizer, etc. would then pass this through to it’s output (assuming the algorithm’s output structure provides a way to store auxiliary information).

I don’t think we should require a specific order in the NamedTuple, long term. With a bit of generated function magic we can extract what we need, check if there’s a gradient, and then dispatch in a type-stable fashion (and automatically auto-diff if necessary). Probably would run the target function one time first (kinda like broadcast does to determine it’s result type). We could put together a lightweight package “TargetFunctionConventions.jl” (or so) that packages can use internally so we don’t have to duplicate that kind of value-extraction/canonicalization code. This would be for algorithm implementations to use internally, the package providing the target function wouldn’t need to depend on it. If there’s interest in this proposal, I’d volunteer to put something lightweight and (at least almost) dependency-free together.

I’d love to hear your thoughts on this.

CC @Tamas_Papp, @cpfiffer, @Kai_Xu, @yebai, @ChrisRackauckas (and I’ll have forgotten many others here, my apologies!)

Instead of a NamedTuple, most packages should be outputting result structs, which is essentially the same thing but allows a few other nice dispatches. But I would prefer the interface be on functions, i.e. value(res) instead of on properties, i.e. res.value.

2 Likes

I am not sure we need a common convention, as conversions should be trivial. Also
I would rather have the user convert between log and linear values when applicable. TBH, for MCMC a non-log PDF just does not make sense numerically and is not something one can recover from with a simple log as most of the precision is already lost to under/overflow.

If anything, I would not stick to a NamedTuple, just support for getproperty or accessor functions as @ChrisRackauckas suggested.

As for auxiliary/incidental computations, it is kind of tricky to decide what should be done with them, or if they are needed. Keeping eg 1024 of them around for a NUTS tree of depth 10 may be excessive — I would rather allow them to be calculated on demand with a closure, similarly to how Zygote does backtracing.

For my own packages, related discussion was in

I am planning to redesign the API soon, in a single breaking change.

@ChrisRackauckas: Instead of a NamedTuple, most packages should be outputting result structs, which is essentially the same thing but allows a few other nice dispatches

Sure, as long as the struct is owned by the package that creates the target function, that would be fine too, of course. Mainly I want to enable code to implement target functions that will work with various packages later on, without depending on those algorithm packages just to define/implement the target function itself.

But I would prefer the interface be on functions, i.e. value(res) instead of on properties, i.e. res.value .

Oh, good point, a common API what works like that would be even better than a convention on NamedTuples. But we’ll need a very lightweight package that defines those functions (In BAT.jl, I now have logvalof and logvalgradof, for example). And algorithms packages will need to support it. If we could come up with a common standard here, that would be great!

One question is how to support auxiliary output - of course we could also define an “official” getauxinfo function, or so.

@Tamas_Papp I am not sure we need a common convention, as conversions should be trivial. Also
I would rather have the user convert between log and linear values when applicable. TBH, for MCMC a non-log PDF just does not make sense numerically and is not something one can recover from with a simple log as most of the precision is already lost to under/overflow.

It’s not so much about the auto-conversion, but about defending against the user passing the wrong kind of value. For example, in BAT.jl I have a PosteriorDensity(likelihood, prior). It’s meant to accept pretty much anything as a likelihood - a function, a distribution, etc., but I do want to defend against (possibly less experienced) users passing a function that produces a linear value. So by returning some kind of wrapper type (kind of like a unit), the user makes an explicit statement “this is a log value”. Also, this way a likelihood can just be called that. It’s just weird having PosteriorDensity(likelihood, prior) when the likelihood is an object like density or distribution, but having to write ``PosteriorDensity(log_likelihood, prior)` just to make it clear that it returns a log if it’s a function.

As for auxiliary/incidental computations, it is kind of tricky to decide what should be done with them, or if they are needed. Keeping eg 1024 of them around for a NUTS tree of depth 10 may be excessive — I would rather allow them to be calculated on demand with a closure, similarly to how Zygote does backtracing.

Sure, values that come with extra computational effort should only be calculated on demand. I mean data that is generated inherently as part of the (likelihood or whatever) calculation and that the user wants to preserve, in large-scale applications, that’s not an uncommon requirement.

Of course you’re right, one probably will want to filter/thin it a bit in the output, but that could be controlled by config parameters for the algorithm or so.

I saw that you have a proposal for get_logdensity , get_gradient and get_payload functions in there, so that would go in the direction that @ChrisRackauckas proposed.

But I think it would be important to have this in a very lightweight central package that does nothing else (LogDensityProblems, for example, while I like it, does have it’s own opinions on how to run AD frameworks and has a load time of 7.5s - not a dependency someone will want to take on just to define a likelihood).

Would there be interest in creating such a standard package that just defines a pure API of functions to access target-function result values?

I guess the common API should be function-based, like Chris said, not type-based.

But not every user will want to define their own output types - if I just write a likelihood function quickly, returning a NamedTuple or a standard centrally defined return type would be convenient.

So if we can agree on some “recipes for target functions” package, maybe it’s “get-log-value” and “get-log-value-and-gradient” functions (I’ll leave the function names up for discussion) could support (logval = ..., grad_logval = [...]) out-of-the-box. And possibly some other standard return value types that we’d define in that package.

Not really, it just gives you a convenient default. You can calculate derivatives any way you like (including you calling AD some other way) — see the docs.

The load time is coming from Requires.jl, which is an open issue. Hopefully that will be fixed in the long run by something like Pkg.jl#1285.

I am still not yet convinced about the benefits of this. I understand that it can be nice to run optimization using different frameworks with a single API, and I think that the optimization packages around Optim.jl have a common interface for that already.

But if I am running MCMC, I have already coded it as a log posterior, and it in 90% of cases it will be just bogus to run it as is even for MAP (specifically, for multilevel/nuisance parameters, you often get singularities so you need a different prior).

(side note about DynamicHMC and log posterior optimization)

The few optimization steps taken by DynamicHMC are a heuristic, and not a particularly good one since they may overdo it and not end up in the typical set. They will be replaced at some point.

Different package authors have different styles, and it may be difficult to cover everything. Eg certain packages like interfaces with preallocated buffers and modifying functions.

LogDensityProblems, for example, while I like it, does have it’s own opinions on how to run AD frameworks and has a load time of 7.5s
@Tamas_Papp: The load time is coming from Requires.jl, which is an open issue. Hopefully that will be fixed in the long run

Sure, I know - but the long run might be quite a while yet …

I am still not yet convinced about the benefits of this. I understand that it can be nice to run optimization using different frameworks with a single API, and I think that the optimization packages around Optim.jl have a common interface for that already

Exactly - I think the approach about having lightweight common “almost-API-only” packages has been extremely successful in the Julia community/ecosystem. We have plotting recipes, we have ZygoteRules and ChainRulesCore, we have NLSolversBase and MathOptInterface, and so on.

So why wouldn’t we profit from having a common API for target functions like densities and loss functions, so that these can be coded up in a way that’s compatible with a variety of algorithm implementations?

Different package authors have different styles, and it may be difficult to cover everything.

Certainly, but I think something like @ChrisRackauckas proposed

[…] interface be on functions, i.e. value(res) instead of on properties, i.e. res.value .

and like you’re planning yourself with get_logdensity, get_gradient and get_payload could probably go quite a long way in this case.

I’m not talking about a common API for samplers, etc. Just a common API to access parts of the return value of target/loss functions: The log(value), optionally a gradient and optionally auxiliary information (what you called a payload).

If you are already thinking about get_logdensity, get_gradient and get_payload (whatever they will be named), surely it would be beneficial to have these in a lightweight central package with almost zero load time and dependencies? Certainly there would be no harm in it. And if we could get packages like (e.g.) DynamicHMC, AdvancedHMC, Optim, etc. to support that interface - maybe in addition to their own style - I think users and generic code that implements denstiy/loss-functions could potentially benefit a lot from that.

It would be nice if in the easy case, the user would just have to implement a simple function/closure, with no special methods, and could just return a (logd = ..., [gradient = ...]) - which get_density_logval and friends would know how to handle by default. And in more complex cases anything could be returned, as long as the access-functions get methods for that custom type.

Some packages have a common interface, some don’t. Eg from the examples you have mentioned, NLSolversBase is perhaps the one that is used most widely… with 13 direct dependents, while there are many more package doing optimization.

Personally I find the interface a bit too convoluted (with modifying and non-modifying variants etc), but I respect that some people find it useful. Making a common API for objective functions would require that we find some common ground though; at this stage of the ecosystem’s development I am not entirely sure this is a reasonable goal.

This is a nice example because it actually exposes a difference in design. My understanding (and I would appreciate being corrected if I am wrong) is that the authors of Turing.jl wrote AdvancedHMC.jl as a backend for use in Turing.jl, while DynamicHMC.jl wants users to code their own log posteriors. I am not sure if the intersection is very useful and someone would switch between these two — after all, they do pretty much the same thing.

Again, what I am not convinced about is the use case of switching between MCMC, optimization, and other approaches which need a \mathbb{R}^n \to \mathbb{R} function (w/ derivatives). While seemingly similar, these usually require a different function.

I recognize that optimization could benefit from a common interface, the and AFAIK NLSolversBase.jl aims to do just that. Maybe refining the API for objective functions would be a reasonable interim goal to start with.

NLSolversBase is perhaps the one that is used most widely… with [13 direct dependents] […] Personally I find the interface a bit too convoluted

I fully agree with you, that’s why I wanted to go for something very simple.

As for buffers - I guess that’s not so popular at the moment anyhow because Zygote doesn’t support array modification? The gradient may become part of a larger computation after all, so that AD would still come into play.

Making a common API for objective functions would require that we find some common ground though

I think your and Chris’ idea about having access functions - esp. three for value, gradient and aux data, like you were also planning - would make for a nice common ground. It’s actually very similar to what I recently started to implement in BAT.jl.

My suggestion to use NamedTuples (I think the access functions should still support them) was motivated by the fact that we wouldn’t even need a central package. But Chris is right, functions should be at the center.

And of course you’re right, it would only work if package authors are willing to adopt it.

at this stage of the ecosystem’s development I am not entirely sure this is a reasonable goal.

Well, don’t fault me for trying. :slight_smile:

authors of Turing.jl wrote AdvancedHMC.jl as a backend for use in Turing.jl, while DynamicHMC.jl wants users to code their own log posteriors

Well, we had this discussion a while ago (ANN: DynamicHMC 2.0 - #14 by oschulz) - from what I understood, AdvancedHMC is definitely also meant to be used standalone.

It works fine that way, we actually added AdvancedHMC as a sampling backend to BAT.jl recently (BAT is about providing comfort for users that bring a likelihood, so very different from Turing), and I was thinking about adding DynamicHMC too. I can’t speak for the Turing authors, of course, but the API of AdvancedHMC is nicely decoupled from Turing itself.

Again, what I am not convinced about is the use case of switching between MCMC, optimization

Well, I often want to sample a posterior, and then use an optimizer to find the global mode, for example (e.g. starting with the max-posterior sample). Not for MAP as the primary result - we usually absolutely want to have the credible region, but we want to know where the mode is, too. So here I want to run sampling and optimization on exactly the same target function. And then, I may want to integrate the same function (it’s exp, since the posterior returns it’s log, of course) to get the evidence. So here I run three sampling, optimization and integration on the same function. And some, all, or none of them may want a gradient or not, and I may want to use AD or not.

And sometimes, I may want to use more than one sampler (provided more than one can handle the problem) to sample a posterior, to ensure I can trust a result (before publication).

From a user point of view, it would be really nice to just write a plain function. And to have some standard mechanism to declare it returned a log-space value, and to attach a gradient or not, and aux info or not - in a standardized, simple return value/structure. One shouldn’t have to implement custom methods, define custom types, etc. just to define a likelihood function (just as an example). Of course complex cases may require something beyond that, but for many use cases that would be enough.

I just had a feeling that maybe the time was right for a lightweight API or conventions for density/loss/target-functions, since they are such a central concept. But maybe the time isn’t right just yet?

My point (which is well known) was that it is very easy to get (otherwise proper, well-behaved) posteriors where the “mode” is at a singularity. This happens all the time in multilevel models, and can be demonstrated nicely with Neal’s funnel at x_i \to 0, y \to -\infty. Because of this, optimizing the log posterior is not really useful for models above a certain complexity because it actually takes you away from the typical set (and not needed for simple models, where you can start pretty much anywhere).

I am not sure, people who maintain various optimization packages would be the best to ask here. In any case, I would start with optimization, and see where that goes. FWIW, it is my impression that experimentation is still going in with various interfaces, eg OptimKit.jl is a nice recent example.

2 Likes

Certainly, there are such cases. But we very often have physics applications where the model isn’t multi-level, or at least not with many levels, and where the posterior is well behaved. In such cases, it’s quite common in our community to determine the mode as well as credible regions and other quantity.

There are nasty cases too, of course … but many times, the real word is actually kind to us. :slight_smile: