- SimplexBijector{Val{true}}() #default transformer - a softmax transform which is non-bijective. …
Everything in this point is correct, but it’s not a softmax transform (this is what I tried to correct from my previous comment; sorry about that!). It’s the same stick-breaking transformation as SimplexBijector{Val{false}}
but setting the last component to zero.
And you’re correct that the abs-det-log-jacobian term doesn’t make any sense in this particular case since for the last component we’d have a zero along the diagonal which we’d subsequently attempt to take the log off. But if you consider the transformation as only working on a (n - 1) subspace, we end up dropping the last component and hence the jacobian is invertible (and (n - 1) by (n - 1) rather than n by n). This way everything works out.
- SimplexBijector{Val{false}}() …
You’ve understood this correctly, but
logpdf(distr_trans2, inv_bij2([1.,2,0]) ) == logpdf(distr_trans2, inv_bij2([1.,2,300]) ) #FALSE
should not happen. It does happen, but due to numerical issues rather than an issue with the transform itself
b = Bijectors.SimplexBijector(false)
ib = inv(b)
ib([1., 2, 0]), ib([1., 2, 300]) # => ([0.576117, 0.373355, 0.0505281], [0.576117, 0.373355, 0.0])
sum.((ib([1., 2, 0]), ib([1., 2, 300]))) # => (1.0, 0.949471894068249)
i) Is that about right? What I do not truely get is why I have to (or am I wrong?) add a 0 to the K-1
sampled parameter in order to transfer back via the SimplexBijector{Val{false}}
bijector such that the constrained parameter sum up to 1. I guess the advantage of handling the ‘0’ internally does not outweigh the additional difficulties that you have when you need to add the likelihood (wrt to constrained k-dimension parameter) to the transformed (prior) distribution?
I’m not entirely certain what you mean by “does not outweigh the additional difficulties that you have when you need to add the likelihood (wrt to constrained k-dimension parameter) to the transformed (prior) distribution?”. Hopefully this is answered by my explanation above. Just to restate: the inverses of both transformations are identical when restricted to the image of the transformation b
(i.e. (n - 1)-dim real space), but outside of this proj = true
is NOT a bijection but will give you something that lies on the unit-simplex while proj = false
is still a bijection but whose output does not necessarily lie on the unit-simplex.
ii) In the code above, only the first two elements of the gradient are non-zero, but this time it matches the dimension of the transformed parameter (excluding the 0 that I need to add before taking the inverse), and the logpdf_transformed is now dependent on all parameter. I would assume that Turing would use this transform instead of the default one, and then use only Gradient information of the first K-1 parameter? If so, is there a reason why transform(Dirichlet(..))
assigns by default SimplexBijector{Val{true}}()? I assumed that the other option would be prefered in this case?
You’re correct that we use only the first K - 1 parameters, but I’m not sure why you’d then prefer the proj = false
. Then you’d be doing extra work only to throw it away afterwards. It indeed can cause issues with the jacobian being singular/non-invertible, but as explained above, this is handled appropriately internally.
Hopefully this helps! I do agree it’s all quite confusing