Turing conditioning syntax and models using @addlogprob!

Hello! It is my understanding the preferred syntax to provide observations to Turing models going forward is to make use of the condition operator model() | (;y) instead of explicitly passing the observations model(y). However, I do not see how this works when building a model without an explicit likelihood in the code. Consider the model below which uses a Kalman filter over the observations and the Turing.@addlogprob! macro instead of something like y ~ Distribution(...).

@model function kalman(ys, H, F, Q, P, R)
    _, N = size(ys)
    latent_dim = size(H, 2)
    xβ‚€ ~ MvNormal(zeros(latent_dim), I)
    x = xβ‚€
    map(1:N) do t
        x, P, y, S = kalman_predict(x, P, H, F, Q, R)
        r = ys[:,t] - y # without ys as arg this quantity cannot be computed
        x, P, y, S = kalman_update(x, P, r, S, H, R)
        Turing.@addlogprob! - 0.5 * sum(logdet(S) + r'inv(S)*r)
    end
end

Without explicitly passing ys there is no way to write down the log probability, so I am curious if there is a way to use the conditioning syntax with this kind of model?

Yeaah that’s unfortauntely because we don’t have a great solution to that at the moment.

As a short-term solution, see the code-snippets after EDIT 2 in Supporting mutating ADs in models that fill arrays of parameters Β· Issue #412 Β· TuringLang/DynamicPPL.jl Β· GitHub

Using the @isobservation and @value macros there, you should be able to achieve what you want.

These might make their way into the codebase in the not too distant future.

1 Like

Thank you for explaining the situation. I’ll probably wait to make changes until an API becomes public.

Could you explain why it is Turing is moving to the newer syntax? It seems to introduce some new complexity so I assume it is for a good reason.

1 Like

A similar issue arises when one wants to compute pointwise log likelihoods. Since there are no observables passed to the model Turing.pointwise_loglikelihoods cannot be used. I stumbled upon this when attempting to use Arviz for computing loo.

I want to clarify that I am not posting here as a cry for fixes, but I think it is helpful to articulate these use cases somewhere persistent since it lets them be addressed once and for all.

3 Likes

It’s more user-friendly and it’s much easier to work with programmatically.

For the former, it’s just nice to be able to tell the model to no longer consider a variable to be an observation (you can do this using decondition), e.g. if you want to look at the posterior predictive. The way of doing this using the arguments is to pass missing in its place, but this sooo often goes wrong and we’ve experienced users being generally quite confused about this.

When observations are passed as an argument, it’s not possible for me to query the model itself as to what is considered an observation without execution (we have to execute the model to determine which of the arguments end up on the left-hand side of a ~). In contrast, if you use the model | ... or condition(model, ...) syntax, the observations are effectively attached to the model in a way that we can then easily inspect using observations.

For example, we are working on simplifying the inner workings of our Gibbs sampler, where we now can simply use the condition(model, ...) syntax to sequentially fix different variables.

With all that being said, in the example with the Kalman filter you give above, it’s very understandable that you’d want to pass it as an argument. But in this case your ys never appear on the LHS of ~, and Turing won’t consider this argument as an observation.

So this has nothing to do with condition vs. pass-obs-as-arg, but it’s just the restrictions you get when using @addlogprob!. This, AFAIK, is the same in every other PPL, i.e. if you use the β€œmanual accumulation of log-probability”, then you lose some of the convenience.

BUT, we can β€œfix” this! If you want to make the Turing consider your ys as an observations, for things like loo, we can implement a β€œdistribution” that effectively does what @addlogprob! does but at the same time, since it’s a Distribution, signals to Turing.jl that β€œhey, this thing on the LHS of this ~ is an observation!”.

"""
    DiracWithProb(value[, logp])

A Dirac delta distribution with a single point mass at `value` and `logpdf` equal to `logp`.

This is basically the same as `Distributions.Dirac` but without
being restricted to univariate values, in addition to having a non-trivial `logpdf`.

!!! warning
    The non-trivial `logpdf` makes it an improper distribution, but it can be useful
    in certain scenarios when used in a model.
"""
struct DiracWithProb{T1,T2,V} <: Distributions.DiscreteDistribution{V}
    value::T1
    logp::T2
end

variateform(x::Real) = Distributions.Univariate
variateform(x::AbstractArray{<:Real,N}) where {N} = Distributions.ArrayLikeVariate{N}

DiracWithProb(x) = DiracWithProb(x, zero(eltype(x)))
DiracWithProb(x, logp) = DiracWithProb{typeof(x),typeof(logp),variateform(x)}(x, logp)

const UnivariateDiracWithProb{T1,T2} = DiracWithProb{T1,T2,Distributions.Univariate}
const ArrayVariateDiracWithProb{T1,T2,N} = DiracWithProb{T1,T2,Distributions.ArrayLikeVariate{N}}

Base.eltype(::Type{DiracWithProb{T}}) where {T} = T

# Some required stuff from Distributions.jl.
Distributions.insupport(d::UnivariateDiracWithProb, x::Real) = x == d.value
Distributions.insupport(d::DiracWithProb, x::AbstractArray) = x == d.value

Base.minimum(d::DiracWithProb) = d.value
Base.maximum(d::DiracWithProb) = d.value
Distributions.support(d::DiracWithProb) = (d.value,)

# Evaluation
Distributions.pdf(d::UnivariateDiracWithProb, x::Real) = insupport(d, x) ? exp(d.logp) : zero(d.logp)
Distributions.logpdf(d::UnivariateDiracWithProb, x::Real) = insupport(d, x) ? d.logp : typeof(d.logp)(-Inf)

function Distributions.pdf(d::ArrayVariateDiracWithProb{<:Any,N}, x::AbstractArray{<:Real,N}) where {N}
    return insupport(d, x) ? exp(d.logp) : zero(d.logp)
end
function Distributions.logpdf(d::ArrayVariateDiracWithProb{<:Any,N}, x::AbstractArray{<:Real,N}) where {N}
    return insupport(d, x) ? d.logp : typeof(d.logp)(-Inf)
end

# Sampling
Base.rand(rng::Random.AbstractRNG, d::UnivariateDiracWithProb) = d.value
Base.rand(rng::Random.AbstractRNG, d::ArrayVariateDiracWithProb) = d.value

Equipped with this, we can do stuff like

julia> @model function demo(x, y)
           s ~ InverseGamma(2, 3)
           m ~ Normal(0, √s)
           x ~ DiracWithProb(x, logpdf(Normal(m, √s), x))
           y ~ DiracWithProb(y, logpdf(Normal(m, √s), y))
       end

demo (generic function with 4 methods)

julia> model = demo(1.5, 2);

julia> chain = sample(model, NUTS(), 1000)
β”Œ Info: Found initial step size
β””   Ο΅ = 0.8
Sampling 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| Time: 0:00:00
Chains MCMC chain (1000Γ—14Γ—1 Array{Float64, 3}):

Iterations        = 501:1:1500
Number of chains  = 1
Samples per chain = 1000
Wall duration     = 0.08 seconds
Compute duration  = 0.08 seconds
parameters        = s, m
internals         = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size

Summary Statistics
  parameters      mean       std      mcse   ess_bulk   ess_tail      rhat   ess_per_sec 
      Symbol   Float64   Float64   Float64    Float64    Float64   Float64       Float64 

           s    1.9040    1.4629    0.0561   721.9248   664.9316    0.9992     9499.0100
           m    1.1291    0.7297    0.0314   556.9236   473.0818    1.0012     7327.9420

Quantiles
  parameters      2.5%     25.0%     50.0%     75.0%     97.5% 
      Symbol   Float64   Float64   Float64   Float64   Float64 

           s    0.5666    1.0414    1.4280    2.2449    6.0441
           m   -0.4221    0.7014    1.1622    1.5815    2.5344


julia> pointwise_loglikelihoods(model, MCMCChains.get_sections(chain, :parameters))
Dict{String, Matrix{Float64}} with 2 entries:
  "x" => [-1.11969; -0.859207; … ; -1.1811; -1.15306;;]
  "y" => [-1.20779; -1.31884; … ; -0.877885; -0.944768;;]

which will allow it to work with loo, etc.

3 Likes

@torfjelde Thanks for that great code example!
Is this available on the Turing website as well? I assume that this could be more helpful example to more people than just the ones in this threads.

I sometimes feel that these β€œsmart” code snippets are limited to Discourse and difficult to find again.

It’s not there yet, but I’m literally working on it as we speak:) It might take a few days to get it ready though. Will post link here once it’s down.

1 Like