TransformVariables.jl vs Bijectors.jl

I’ve been using @Tamas_Papp’s TransformVariables.jl for quite a while in my probabilistic programming work, and I’m very happy with it.

Not long ago (weeks? months? one of those) I learned of Bijectors.jl, which looks mostly to be the work of @mohamed82008, @yebai, and @torfjelde.

These packages seem to have identical purpose, with some differences in capabilities. TrasnformVariables has great support for named tuples, for example, while Bijectors seems unique in its coverage of transforms for some distributions, and also has some really nice support for normalizing flows.

I would love for all of this to be available in a single package. Could there be a path toward this?

If you are missing something in TransformVariables.jl, I am happy to take PRs (most of the time it is quite straightforward to code your own transformations, eg I coded the existing ones from the Stan manual; then you can test the Jacobian with AD).

Since the predecessor of TransformVariables.jl (ContinuousTransformations.jl) was around for a while when Bijectors.jl was started, I am assuming that the TuringLang contributors had good reasons to start their own package, but I don’t know what these were.

1 Like

I cannot speak for the reasons why Bijectors.jll was started initially as I was not part of the team then (nor do I know what the other options were at the time). I did consider TransformVariables.jl, which is a really nice package, prior to my work on Bijectors.jl and it seems to be perfect for transforming constrained-to-unconstrained distributions and vice-versa.

But such transforms are really nothing more than functions with some additional structure e.g. invertible, and so I found myself really wanting a package where this is how it felt; you call a bijector as you would a function, you invert a bijector as you would a real number, you compose bijectors as you would functions, etc. As long as you write type-stable code, you also get a lot of neat possibilities when it comes to transformations at compile-time, e.g. b ∘ inv(b) becomes identity. Then the distribution-related functionality is kind of “separate”, where a TransformedDistribution is simply the push-forward of some distribution d induced by a bijector b. This way you can do all kinds of nonsense with the bijectors themselves and then use the resulting bijector push forward any support distribution, e.g. normalizing flows, use in ADVI.

In conclusion, there are certain areas where Bijectors.jl and TransformVariables.jl overlap. For applying a “non-composed” transformation to a named tuple, I think TransformVariables.jl is really nice. If you’re doing complex stuff with the bijectors themselves, especially compositions, I think Bijectors.jl is the way to go.

As for combining efforts, technically (as far as I can tell) Bijectors.jl can do what TransformVariables.jl can do but we don’t have the nice interface with named tuples. Though it should be possible to implement such an interface for Bijectors.jl; we already have a similar thing for stacking together scalar- and vector-transformations in Stacked <: Bijector which uses index-ranges rather than names to decide where to apply which transform.

Unfortunately, I think moving Bijectors.jl to TransformVariables.jl would be difficult due to the significant difference in the underlying design (which I personally like quite a lot :/).

4 Likes

Thanks Tor! I wonder to what extent these can be used together, say with TransformVariables for the named tuple interface and Bijectors for the flows, etc. I’d guess the weirdest part of that could be getting the log-jacobians wired together right.

1 Like

I think the easiest approach would be to just wrap and unwrap methods around Bijectors.jl. Take each of the elements in the named tuple, apply the corresponding transforms separately, and then wrap it into a named tuple again. This way you don’t have to go through the pain of somehow making the compositions work nicely.

A since we’re working with named tuples, you could do this using @generated for maximal efficiency:)

1 Like

You’d still have to manage the Jacobians though, right? Since the two packages have different mechanisms for tracking this.

Makes me wonder about constructors for wrapping a bijector in a transform or vice versa

Indeed, calculating the log Jacobian is the reason why what TransformVariables calls “aggregators” (transform to arrays, tuples, named tuples) are considered transformations.

I have been thinking of decoupling these two things, but I have yet to come up with an interface that is actually an improvement over the current one:

1 Like

@torfjelde : Sorry to dig out a relatively old post - but congratulations to this package! Its really nice and useful!

One question that immediately came up for me is that the package’s mapping functions keep the same dimensionality when mapping from a constrained to an unconstrained space, even if you could potentially decrease dimensions, for instance when you map a Simplex or a Covariance matrix. Example:

prior_dist = Dirichlet([1,2,3,4]) # \in R^4 
bi = Bijectors.transformed(prior_dist)
rand(bi) # \in R^4 with last element fixed to 0

Is there a specific reason for that (I know the module is called Bijectors but still)? I figured a big reason for that package is the support for Turing, and decreasing dimensionality should be good for their supported samplers. If there is a big model where many such parameter are mapped then one could have a substantially lower unconstrained space.

Best regards,

Edit:
I went through the STAN manual who perform said transformation in their sampler, but its not so easy to actually get the dimension right when you calculate gradients wrt the unconstrained space and vice versa. I suppose keeping the dimensions the same and constrain them to zero makes coding quite a bit more comfortable/saves you the cost of getting Jacobians wrt unconstrained parameter at the cost of a larger dimension to sample from.