Issue querying probabilities using chain

Hi,

This is my first post, so I will try to make it as clear as possible, otherwise point out anything that is unclear.

I am running into an issue with querying probabilities using chains. I tried to reduce it to as small of a problem as possible which has led me to believe it might be a bug/not the intended behavior. Either that, or I am missing something else.

In the code I pasted below, the first 3 results (result1, result2, and result3) all print different values whenever they are rerun as individual cells, even though I would expect them to stay the same since the chain doesn’t change.
Afterwards, trying the joint probabilities of x and y, they do stay consistent when rerunning the cell (as one might expect).

So the question:

  • Am I doing something wrong here or might this be a bug?

The code below can also be found as a Pluto.jl notebook on github: https://github.com/mgmverburg/Turing_examples/blob/master/potential_query_bug.jl

using Turing, Distributions

@model function gdemo(x, y)
	if x === missing || x === nothing
        # Initialize `x` if missing
        x = Vector{Float64}(undef, 2)
	end
	n = length(x)
    s ~ InverseGamma(2, 3)
    m ~ Normal(0, sqrt(s))
    x ~ filldist(Normal(m, sqrt(s)), n)
	for i in 1:length(y)
    	y[i] ~ Normal(x[i], sqrt(s))
	end
end

model_gdemo = gdemo([1.0, 0.0], [1.5, 0.0])
c2 = sample(model_gdemo, NUTS(0.65), 100)
result1 = prob"y = [1.5] | chain=c2, model = model_gdemo, x = [1.0]"
println(mean(result1))
result2 = prob"y = [1.5] | chain=c2, model = model_gdemo, x = [0.0]"
println(mean(result2))
result3 = prob"y = [1.5] | chain=c2, model = model_gdemo, x = nothing"
println(mean(result3))
result4 = prob"y = [1.5], x = [1.0] | chain=c2, model = model_gdemo"
println(mean(result4))
result5 = prob"y = [1.5], x = [0.0] | chain=c2, model = model_gdemo"
println(mean(result5))

Thanks in advance!

From your model, I find the following factorization for your joint probability
p(y,x,m,s)=p(x|m,s)p(y|m,s)p(m,s)
So since y does not depend on x then for any value of x, p(y|x)=p(y)

Thank you very much! You are right, my ‘minimum example’ here is wrong. When I went back to my bigger model to check the behavior, the main issue was also different. Indeed (as in my updated question is now stated) the value keeps changing for when I compute conditional probability queries, which are also not giving me the correct values. I update my example and now it correctly shows the ‘unexpected’ behavior. What I previously showed was also unexpected behavior, but caused by my incorrect model xD

TL;DR: Updated my example with the correct ‘incorrect’ behavior

Yeah, that seems to be a bug. You can see that Turing “accidentally” resamples x even though it is passed in the list of values to condition on.

@model function gdemo(x, y)
    s ~ InverseGamma(2, 3)
    m ~ Normal(0, sqrt(s))
    x ~ filldist(Normal(m, sqrt(s)), length(y))
    @show x
    for i in 1:length(y)
        @show x[i]
        y[i] ~ Normal(x[i], sqrt(s))
    end
end

result1 = prob"y = [1.5] | chain=c2, model = model_gdemo, x = [1.0]"

Btw. you don’t need to load Distributions as this is reexported by Turing. Also the lines in which you initialize x if it is missing is not necessary.

cc: @mohamed82008

Could you file an issue for this at DynamicPPL?

Please open an issue to track this. (oops didn’t see your post Cameron!)

Thanks, I submitted an issue at dynamic PPL! https://github.com/TuringLang/DynamicPPL.jl/issues/190

1 Like