There are still some problems with the code that you might want to fix:
- It still uses the deprecated
MvNormal
constructors.
- The Horseshoe+ model is wrong it seems. I didn’t study all details but
y
does not depend on X
- I think you just specified the prior of the coefficients but you forgot intercept, noise, and putting them together (similar to the original model). Maybe similar updates are needed in the third variant, I didn’t check it carefully.
- The
InverseGamma
distribution in the third variant is only supported on the positive real line, and hence it is not necessary to truncate it.
- There are some redundant computations and allocations that can be avoided and might lead to minor performance improvements.
That I don’t know how to do it yet.
Hopefully the following code illustrates what I mean. It contains also some other fixes but the main point is the use of functors without X
and y
arguments. Of course, one could use different types but I wanted to show that multiple definitions + dispatching works just fine (also a recent bug fix that it works for Val
etc.):
using Turing
using FillArrays
using LinearAlgebra
struct HorseShoePrior{T}
X::T
end
@model function (m::HorseShoePrior)(::Val{:original})
# Independent variable
X = m.X
J = size(X, 2)
# Priors
halfcauchy = truncated(Cauchy(0, 1); lower=0)
τ ~ halfcauchy
λ ~ filldist(halfcauchy, J)
β ~ MvNormal(Diagonal((λ .* τ).^2)) # Coefficients
α ~ TDist(3) # Intercept
σ ~ Exponential(1) # Errors
# Dependent variable
y ~ MvNormal(α .+ X * β, σ^2 * I)
return (; τ, λ, α, β, σ, y)
end
@model function (m::HorseShoePrior)(::Val{:+})
# Independent variable
X = m.X
J = size(X, 2)
# Priors
τ ~ truncated(Cauchy(0, 1/J); lower=0)
η ~ truncated(Cauchy(0, 1); lower=0)
λ ~ filldist(Cauchy(0, 1), J)
β ~ MvNormal(Diagonal(((η * τ) .* λ).^2)) # Coefficients
α ~ TDist(3) # Intercept
σ ~ Exponential(1) # Errors
# Dependent variable
y ~ MvNormal(α .+ X * β, σ^2 * I)
return (; τ, η, λ, β, y)
end
@model function (m::HorseShoePrior)(::Val{:finish}; τ₀=3, ν_local=1, ν_global=1, half_slab_df=2, slab_scale=2)
# Independent variable
X = m.X
J = size(X, 2)
# Priors
β₀ ~ Flat()
logσ ~ Flat()
σ = exp(logσ) # Noise std
z ~ MvNormal(Zeros(J), I)
λ ~ filldist(truncated(TDist(ν_local); lower=0), J) # Local shrinkage
τ ~ (τ₀ * σ) * truncated(TDist(ν_global); lower=0) # Global shrinkage
c_aux ~ InverseGamma(half_slab_df, half_slab_df)
# Dependent variable
c = slab_scale * sqrt(c_aux)
λtilde = λ ./ hypot.(1, (τ / c) .* λ)
β = τ .* z .* λtilde # Regression coefficients
f = β₀ .+ X * β # Latent function values
y ~ MvNormal(f, I)
return (; τ, σ, logσ, λ, λtilde, z, c, c_aux, β₀, β, f, y)
end
Let us create some toy data:
julia> X = randn(2, 4);
We can create a model as usual:
julia> model_original = HorseShoePrior(X)(Val(:original));
The main point is that neither X
nor y
are arguments of the model
julia> model_original.args
(##arg#405 = Val{:original}(),)
julia> DynamicPPL.inargnames(@varname(X), model_original)
false
julia> DynamicPPL.inargnames(@varname(y), model_original)
false
but X
is captured by the model function:
julia> model_original.f
HorseShoePrior{Matrix{Float64}}([0.9008876947448639 0.9172982651499152 -1.7703753804116238 1.5009865604420913; 0.15786060957092293 0.6001936008984858 0.8759197705885826 0.9490750485288362])
Since y
is not an argument to the model we can sample from the prior without having to use another model instance with y=missing
. Instead we can just use
julia> priorsample_original = rand(model_original)
(τ = 3.695731988541993, λ = [0.7107394347599458, 1.0038839300698026, 0.026017515446747786, 0.7473783016558637], β = [-1.4881921127500568, 1.7428254134564192, -0.014855283604250714, -4.405199362018812], α = 1.9879125028306674, σ = 0.5943693342886811, y = [-3.9333591000191186, -1.432671943129296])
(BTW also a recent addition that rand
allows to do that and is the official way for sampling from the prior.)
We can evaluate logjoint
and logprior
probabilities and the loglikelihood
of the model for the prior samples:
julia> logjoint(model_original, priorsample_original)
-12.434955305181779
julia> loglikelihood(model_original, priorsample_original)
0.0
julia> logprior(model_original, priorsample_original)
-12.434955305181779
There are no observations here and hence the loglikelihood
is zero and the logjoint
is equal to the logprior
.
Now let us generate some targets
julia> y = rand(2);
We can use these observations in our model by conditioning it on the observations. One can use
julia> model_original_y = model_original | (; y);
which is equivalent to
julia> model_original_y = DynamicPPL.condition(model_original, (; y));
or
julia> model_original_y = DynamicPPL.condition(model_original; y);
The conditioned model will use the provided value for y
during model execution everywhere it appears in the model. y
did not became an argument
julia> model_original_y.args
(##arg#405 = Val{:original}(),)
but this is done by using a special context information:
julia> model_original_y.context
ConditionContext((y = [-0.15666952029608294, 1.080560882554258],), DynamicPPL.DefaultContext())
Since y
is fixed now, it’s value is not returned if we sample from the conditioned model:
julia> sample_original = rand(model_original_y)
(τ = 0.7153433622210013, λ = [3.8969988625444807, 108.0150249313839, 0.26170567941384487, 0.09822866602648996], β = [0.7437719510675507, -58.85226099400912, 0.02501194719031626, -0.18210821639807004], α = 0.5063561352542082, σ = 1.3204010967080868)
Of course, in the conditioned model the loglikelihood
is non-zero and logjoint
and logprior
are different:
julia> logjoint(model_original_y, sample_original)
-1203.0847753639716
julia> loglikelihood(model_original_y, sample_original)
-1177.2934647548811
julia> logprior(model_original_y, sample_original)
-25.791310609090537
And, of course, we can now perform inference as usual, e.g., with NUTS
:
julia> sample(model_original_y, NUTS(), 1_000);
The same works also for
julia> model_plus = HorseShoePrior(X)(Val(:+));
julia> model_finish = HorseShoePrior(X)(Val(:finish));