Simplex transforms in Bijectors.jl (sorry)

I’d like to understand the ideas behind the way Bijectors handles simplices.

First, something I think I understand, for comparison:

TransformVariables

julia> using TransformVariables

julia> t = UnitSimplex(4)
UnitSimplex(4)

julia> t(ones(3))
4-element Vector{Float64}:
 0.4753668864186717
 0.3022499950414722
 0.16257508655105501
 0.05980803198880098

A space with four numbers that add to one has three degrees of freedom, so the input here takes ones(3). Giving it more throws an error:

julia> t(ones(4))
ERROR: ArgumentError: dimension(t) == length(x) must hold. Got
dimension(t) => 3
length(x) => 4
Stacktrace:
 [1] macro expansion
   @ ~/.julia/packages/ArgCheck/5xEDR/src/checks.jl:243 [inlined]
 [2] transform
   @ ~/.julia/packages/TransformVariables/N5VYB/src/generic.jl:228 [inlined]
 [3] (::UnitSimplex)(x::Vector{Float64})
   @ TransformVariables ~/.julia/packages/TransformVariables/N5VYB/src/generic.jl:123
 [4] top-level scope
   @ REPL[14]:1 

This is all as I would expect.

Bijectors

The corresponding code in Bijectors seems to be

julia> using Bijectors

julia> t = Bijectors.SimplexBijector()
Bijectors.SimplexBijector{1, true}()

julia> inv(t)(ones(4))
4-element Vector{Float64}:
 0.4753668864186717
 0.30224999504147226
 0.16257508655105507
 0.059808031988801025

It’s the same result (up to what looks like maybe rounding error). But it requires one more dimension than is needed! To confirm this,

julia> inv(t)(ones(4)) |> t
4-element Vector{Float64}:
 1.0000000000000002
 1.0
 1.0
 0.0

This shows that the t is not a bijection. Given that it’s possible to set up so it is a bijection, what’s behind this design choice?

1 Like

I think it’s mostly historical. In Turing, we assume the transformed and untransformed domains have the same dimension (for simplicity I believe). Bijectors was born out of Turing’s code base and it’s used primarily there so it had to be made compatible. I am starting to appreciate the conceptual simplicity of separating the dimension of the input from that of the output more though and so it seems the other developers of Bijectors as well. For example, see the comments in https://github.com/TuringLang/Bijectors.jl/pull/168 and https://github.com/TuringLang/Bijectors.jl/issues/58. However, perhaps the biggest obstacle to this change would be making Turing/DynamicPPL compatible with it. This would require a more significant effort.

2 Likes

That helps a lot, thanks @mohamed82008

1 Like

I am using Bijectors for almost all my work in Bayesian settings, it is a really nice and useful package!

But I also have questions regarding possible dimensionality reductions. If you work with larger Covariance Matrices, or many Simplices, the number of parameters in the transformed dimension could be quite considerably lower if the dimension would not be kept constant.

I have not looked at the internals of Turing, but do you keep the dimension the same also for your MCMC sampler? If yes, do you experience performance degradation in this case? I.e., the posterior covariance adaption could be done in a lower dimension, and the turn statistics for NUTS are based on the momentum, which depend on the parameter dimension.

I think for more general use (especially outside Turing) we need something like Bijectors but without the dimensionality issues you describe. Maybe some of the Bijectors should be ported to TransformVariables?

Orthogonally, I appreciate that TV is less entwined with Distributions.jl; that makes it easier for me to use from MeasureTheory.jl.

In my case, I really appreciate it that Bijectors is so connected to Distributions. If a distribution is contained in the Distributions package, then most likely I can transform/inverse transform it via Bijectors, and I can take gradients via mutliple AD packages thanks to DistributionsAD. I actually would find it amazing if this would all be one package, but that would probably be too big, and supporting many AD backends seems like a lot of work as well.

1 Like

It could be fine if it were “available but not required”. Bijectors re-exports so many names from Distributions, which collide with MeasureTheory. A lot of this is on me, but I don’t see a good way for a Distributions alternative to have no names in common with it. Also, I haven’t brought this up to @torfjelde and @mohamed82008 before, because the dimensionality problem made it a no-go for me anyway.

I think these features are pretty orthogonal though — you can only transform (log) pdfs for multivariate distributions in any case, and the ADability of the transformation and the original pdf need not be coupled at all.

1 Like

Weell, one might also say that the implementations you demonstrate above isn’t bijective since the output isn’t constrained in anyway, e.g. I could pass the inverse of the transform any length 4 vector, not just those in the subspace.

But I also want to point out that there’s really nothing that fundamentally stops us from making SimplexBijector transform to space with length(input) + 1. We could just add a type-parameter that specifies whether or not this should be done, and implement the transforms.

And regarding the re-export, that is def something we could have a discussion about doing. I’m not attached to that at all (not a big fan even) :slight_smile:

Also, IMO Bijectors.jl is ripe for a re-design. I’m not a big fan of the nasty dimensionality that is encoded in the bijectors (as I mention in one of the issues raised above), etc. So very open to ideas/suggestions/etc. about the design.

1 Like

That’s true, at some point it becomes a contract with the user. Maybe the biggest difference is that this is a contract not to do something (“Please don’t pass values outside [0,1]”), which can be easier to deal with that a contract to do something (“Please add extra code to ignore the final element in this particular case”).

Thanks for the suggestions. I’d really love to be able to use Bijectors from MeasureTheory. I have some code for expressing pushforwards and pullbacks using TransformVariables, and it would be great to have this for Bijectors as well. Maybe at some point these can merge or interoperate more easily, but for now neither has the complete functionality, so we need both.

1 Like