- SimplexBijector{Val{true}}() #default transformer - a softmax transform which is non-bijective. …

Everything in this point is correct, but it’s *not* a softmax transform (this is what I tried to correct from my previous comment; sorry about that!). It’s the same stick-breaking transformation as `SimplexBijector{Val{false}}`

but setting the last component to zero.

And you’re correct that the abs-det-log-jacobian term doesn’t make any sense in this particular case since for the last component we’d have a zero along the diagonal which we’d subsequently attempt to take the log off. But if you consider the transformation as only working on a (n - 1) subspace, we end up dropping the last component and hence the jacobian is invertible (and (n - 1) by (n - 1) rather than n by n). This way everything works out.

- SimplexBijector{Val{false}}() …

You’ve understood this correctly, but

```
logpdf(distr_trans2, inv_bij2([1.,2,0]) ) == logpdf(distr_trans2, inv_bij2([1.,2,300]) ) #FALSE
```

should not happen. It *does* happen, but due to numerical issues rather than an issue with the transform itself

```
b = Bijectors.SimplexBijector(false)
ib = inv(b)
ib([1., 2, 0]), ib([1., 2, 300]) # => ([0.576117, 0.373355, 0.0505281], [0.576117, 0.373355, 0.0])
sum.((ib([1., 2, 0]), ib([1., 2, 300]))) # => (1.0, 0.949471894068249)
```

**i)** Is that about right? What I do not truely get is why I have to (or am I wrong?) add a 0 to the `K-1`

sampled parameter in order to transfer back via the `SimplexBijector{Val{false}}`

bijector such that the constrained parameter sum up to 1. I guess the advantage of handling the ‘0’ internally does not outweigh the additional difficulties that you have when you need to add the likelihood (wrt to constrained k-dimension parameter) to the transformed (prior) distribution?

I’m not entirely certain what you mean by *“does not outweigh the additional difficulties that you have when you need to add the likelihood (wrt to constrained k-dimension parameter) to the transformed (prior) distribution?”*. Hopefully this is answered by my explanation above. Just to restate: the inverses of both transformations are identical when restricted to the *image* of the transformation `b`

(i.e. (n - 1)-dim real space), but outside of this `proj = true`

is NOT a bijection but will give you something that lies on the unit-simplex while `proj = false`

is still a bijection but whose output does not necessarily lie on the unit-simplex.

**ii)** In the code above, only the first two elements of the gradient are non-zero, but this time it matches the dimension of the transformed parameter (excluding the 0 that I need to add before taking the inverse), and the logpdf_transformed is now dependent on all parameter. I would assume that Turing would use this transform instead of the default one, and then use only Gradient information of the first K-1 parameter? If so, is there a reason why `transform(Dirichlet(..))`

assigns by default SimplexBijector{Val{true}}()? I assumed that the other option would be prefered in this case?

You’re correct that we use only the first K - 1 parameters, but I’m not sure why you’d then prefer the `proj = false`

. Then you’d be doing extra work only to throw it away afterwards. It indeed can cause issues with the jacobian being singular/non-invertible, but as explained above, this is handled appropriately internally.

Hopefully this helps! I do agree it’s all quite confusing