Hi Folks,
I experimented a bit today with some “automatic” non-centered Normal distributions this morning, and I think I have something that works quite well. Many experts know about the trick to replace
mu ~ <mu prior>
sigma ~ <sigma prior>
xs ~ filldist(Normal(mu, sigma), N)
xs_observed ~ arraydist([Normal(x,s) for (x,s) in zip(xs, sigma_xs_observed)])
when the posterior on xs
is dominated by the prior with the non-centered
mu ~ <mu prior>
sigma ~ <sigma prior>
xs_unit ~ filldist(Normal(0,1), N)
xs = mu .+ sigma .* xs_unit
xs_observed ~ arraydist([Normal(x,s) for (x,s) in zip(xs, sigma_xs_observed)])
to avoid the pathological “Neal’s funnel” geometry in the model and dramatically improve sampling. But this is inconvenient in a lot of ways—among them that xs
are now a generated quantity and so aren’t already recorded in the chain. And: in this case it’s quite simple to see how to shift and scale the funnel away, but sometimes you want to shift and scale to a mix of the prior and the observations, and sometimes even more complicated expressions are involved, etc, etc, etc.
It would be great to have a distribution that takes care of this shift-and-scale for stability for you, and I wrote one:
using Bijectors
using ChangesOfVariables
using Distributions
using Turing
struct NonCenteredNormal{T <: Real} <: ContinuousUnivariateDistribution
mu::T
sigma::T
mu_centering::T
sigma_centering::T
end
struct NonCenteredNormalBijector{T <: Real} <: Bijector
mu::T
sigma::T
end
function NonCenteredNormal(mu, sigma)
NonCenteredNormal(mu, sigma, mu, sigma)
end
function ChangesOfVariables.with_logabsdet_jacobian(b::NonCenteredNormalBijector{T}, x) where T <: Real
y = (x-b.mu)/b.sigma
logdetjac = -log(b.sigma)
y, logdetjac
end
function ChangesOfVariables.with_logabsdet_jacobian(ib::Inverse{NonCenteredNormalBijector{T}}, y) where T <: Real
x = y*ib.orig.sigma + ib.orig.mu
logdetjac = log(ib.orig.sigma)
x, logdetjac
end
Distributions.logpdf(d::NonCenteredNormal{T}, x::S) where {T <: Real, S <: Real} = logpdf(Normal(d.mu, d.sigma), x)
Distributions.rand(rng::AbstractRNG, d::NonCenteredNormal{T}) where T = rand(rng, Normal(d.mu, d.sigma))
Bijectors.bijector(d::NonCenteredNormal{T}) where T = NonCenteredNormalBijector(d.mu_centering, d.sigma_centering)
Now you can write such models as the above cleanly, but take advantage of the shift-and-scale transformation:
mu ~ <mu prior>
sigma ~ <sigma prior>
xs ~ filldist(NonCenteredNormal(mu, sigma), N)
xs_observed ~ arraydist([Normal(x,s) for (x,s) in zip(xs, sigma_xs_observed)])
and Turing will take care of the non-centering behind the scenes for you. Optionally, you can also use NonCenteredNormal(mu, sigma, location_to_center, scale_to_center)
if you don’t have a prior-dominated posterior—this will make the sampler see parameters with scale scale_to_center
around location location_to_center
but apply a N(mu, sigma)
prior.
Is this too specialized to put into Turing? Does everyone who knows about it just do it by hand and is happy to continue? Thoughts on the code?
Cheers,
Will