Turing.jl: Confusion about random variables and draws from distributions

Hi, I’m just starting out with probabilistic programming in general and Turing.jl specifically, so I am still in the process of wrapping my head around the logic of it.

I want to build a simple model where a yes/no response is modelled given some parameter that I’m interested in fitting. The idea is that a person might have a prior expectation for some parameter that’s then combined with the evidence coming in, leading to a decision. The evidence should have an uncertainty associated with it, and I want to know what uncertainty explains the behavior.

Here’s a basic program, but it doesn’t work yet because I think it tries to fit one b_guess value? That is just supposed to be an intermediary thing from combining the two distributions. as and bs are experimental variables.

model = @model function response_model(as, bs, responses)

    # this is like a population wide hyperparameter
    σ²_a = 0.1
    # weak prior, this variable is the one of interest
    σ²_b ~ truncated(Normal(0, 100), 0, Inf)

    for i in eachindex(as)
        a_dist = Normal(as[i], sqrt(σ²_a))
        b_dist = Normal(bs[i], sqrt(σ²_b))
        combined = combine_distributions(a_dist, b_dist)
        # here's the problem, I just need b_guess as an intermediate thing, I don't want to fit it
        b_guess ~ combined
        # the guess vs the real b value is fed into the logistic function
        p = logistic(b_guess - bs[i])
        # and then into the bernoulli distribution
        responses[i] ~ Bernoulli(p)
    end
end

function combine_distributions(n1::Normal, n2::Normal)
    t = 1 / n1.σ ^ 2
    u = 1 / n2.σ ^ 2
    m = n1.μ
    n = n2.μ
    Normal((t * m + u * n) / (t + u), sqrt(1 / (t + u)))
end

I think that, even though you don’t want to ‘fit’ the b_guess, because it might not be of interest, there is no way around it. Because the only alternative to ‘fitting’ something is basically randomly generating it, in which case it won’t work either. So the reality is, you will need to fit b_guess, and it will indeed just be an intermediate parameter that you probably won’t inspect or look at after.

Regardless, typically when I did similar things, I also did it like this, realizing then it didn’t work. The reason is simple, you currently do b_guess ~ combined, you do this length(as) times, but only the first time will it actually be sampled, every subsequent time b_guess is already known, so it won’t be resampled.
At least that is what I figure happens, because I noticed in similar situations that the value didn’t change, (can maybe be verified by using a println with very low amount of iterations, so you don’t get spammed too much).
So in that case the solution is as follows, define the array variable for b_guess outside of the loop like such:
b_guess = Array{T}(undef, length(as))
This will also require adding T to the function definition as follows:

model = @model function response_model(as, bs, responses, ::Type{T} = Float64) where {T}

And then lastly, you sample then inside the for-loop by doing this:
b_guess[i] ~ combined

Let me know if it works! (Don’t ask me why and how the T business works, it has something to do with efficiency/type stable things, but I sort of always forget the exact reasoning behind it, but it should be mentioned somewhere, maybe on the Turing website as well)

EDIT: it is mentioned here on the website actually: Performance Tips

1 Like

What @michielver said is indeed correct:)

I’ll add a couple of notes too:

  1. model = @model function response_model(as, bs, responses): the @model macro will define a function called response_model, and so there’s no need to assign the return-value here to model. You can just use response_model(...) to instantiate it.
  2. As mentioned above, when doing inference you need to be doing it jointly over all the variables, i.e. even those variables that are, as you say, “intermediate”, need to be considered. The alternative is marginalizing those intermediate variables; in general this is not possible, but in this particular case you can if you’re willing to change your model a bit. Instead of using responses[i] ~ Bernoulli(logistic(b_guess - bs[i])), you can do responses[i] = (b_guess - bs[i]) < 0 ? 0 : 1, i.e. we instead assign responses[i] a 0 if (b_guess - bs[i]) is less than 0 and 1 if it’s greater than or equal to 0. You can then marginalize out the b_guess, and instead write responses[i] ~ Bernoulli(1 - cdf(combined, bs[i])) (1 - cdf since p is probability of success, i.e. probability of (b_guess - bs[i]) ≥ 0), completely dropping b_guess from our model. This will also result in a lower variance estimator obtained from your model (read: model is easier to infer parameters for).

So, all in all, I’d suggest:

@model function response_model(as, bs, responses, ::Type{TV} = Vector{Float64}) where {TV}

    # this is like a population wide hyperparameter
    σ²_a = 0.1
    # weak prior, this variable is the one of interest
    σ²_b ~ truncated(Normal(0, 100), 0, Inf)

    b_guess = TV(undef, length(as))
    for i in eachindex(as)
        a_dist = Normal(as[i], sqrt(σ²_a))
        b_dist = Normal(bs[i], sqrt(σ²_b))
        combined = combine_distributions(a_dist, b_dist)
        # here's the problem, I just need b_guess as an intermediate thing, I don't want to fit it
        b_guess[i] ~ combined
        # the guess vs the real b value is fed into the logistic function
        p = logistic(b_guess[i] - bs[i])
        # and then into the bernoulli distribution
        responses[i] ~ Bernoulli(p)
    end
end

if you want to keep the model as is (this is the suggestion of @michielver).

But if you’re willing to change your model to instead consider

# NOTE: This doesn't actually work, since we can't compute
# the `logpdf` for `responses[i]` here, i.e. we don't know the likelihood.
@model function response_model(as, bs, responses)

    # this is like a population wide hyperparameter
    σ²_a = 0.1
    # weak prior, this variable is the one of interest
    σ²_b ~ truncated(Normal(0, 100), 0, Inf)

    for i in eachindex(as)
        a_dist = Normal(as[i], sqrt(σ²_a))
        b_dist = Normal(bs[i], sqrt(σ²_b))
        combined = combine_distributions(a_dist, b_dist)
        # here's the problem, I just need b_guess as an intermediate thing, I don't want to fit it
        b_guess ~ combined
        # and then into the bernoulli distribution
        responses[i] = (b_guess - bs[i]) < 0 ? 0 : 1
    end
end

for which we can marginalize out the b_guess to get:

@model function response_model(as, bs, responses)
    # this is like a population wide hyperparameter
    σ²_a = 0.1
    # weak prior, this variable is the one of interest
    σ²_b ~ truncated(Normal(0, 100), 0, Inf)

    for i in eachindex(as)
        a_dist = Normal(as[i], sqrt(σ²_a))
        b_dist = Normal(bs[i], sqrt(σ²_b))
        combined = combine_distributions(a_dist, b_dist)
        # and then into the bernoulli distribution
        # `(b_guess - bs[i]) < 0` is equivalent to `b_guess < bs[i]`
        responses[i] ~ Bernoulli(1 - cdf(combined, bs[i]))
    end
end
1 Like

Thank you, the cdf was a missing puzzle piece for me. It seems to make a lot of sense to condense the random sampling I envisioned into that function, as my sampling can’t get more accurate than this analytical version anyway. It seems to work so far for the dataset I’m using.

Fitting b_guess = TV(undef, length(as)) wasn’t really feasible as as and bs have a length of around 10,000 (maybe it would have been fine but I was too impatient). The other model runs in a fraction of a second.