Some question about the structure of Turing models

Hello,
given a Turing model

@model function λ(y)
    λ₁ ~ Exponential(alpha)
    for i in 1:length(y)
        y[i] ~ Poisson(λ₁)
    end
end

what is the meaning of the y array?
I think it is used to compute the posterior.
When I call a sampler, y will be the data. So in the model strucutre the data is rewrite by the distribution?
Question 0: How does it works exactly?
Question 1: How can I plot the y distribution?
Calling plot(sample(myModel(data), sampler(), n)) does not plot y.
Using PyMC3 the posterior distribution can be plotted as described here:


(figure at “4. Plot the artificial dataset:”)

Question 2: Can I use map/broadcast instead of

    for i in 1:length(y)
        y[i] ~ Poisson(λ₁)
    end

?

Question 0 :
What Turing does is : given the data y it tries to infer the posterior of λ₁, to do that it is going to sample from the posterior using your sampler.
Question 1:
The chain obtained by sample only contains the unknown variables (here λ₁). You would have to create a separate plot for y. For example by using the histogram function from Plots.jl
Question 2:
I think you can use y ~ filldist(Poisson(λ₁), length(y))

1 Like

Thank you for answering.

What Turing does is : given the data y it tries to infer the posterior of λ₁ , to do that it is going to
sample from the posterior using your sampler.
So in Turing assign with ~ to the data array(s) a distribution means “infer the posterior of the used variables”; is correct? How Turing use the data to compute the infence?

For example by using the histogram function from Plots.jl

import Plots
s=e = sample(λ(count_data), HMC(0.01, 10), 10000)
histogram(s)

It plots again only unknown variables.

I meant to use histogram on y. Since y is your data, it’s fixed, you don’t have to sample from it.

So in Turing assign with ~ to the data array(s) a distribution means “infer the posterior of the used variables”; is correct? How Turing use the data to compute the infence?

~ assigns a distribution to a random variable (eventually conditioned on data or other variables). What sampling does is returning samples of the posterior on your random variables, in your case p(lambda_1 | y)

I would like to plot the excepted data not my data. (The red line in this plot)
Example in PyMC3:

I see what you want now but this not what your model would be doing.
I guess the easiest to reproduce this plot would be something like :

s = sample(λ(count_data), HMC(0.01, 10), 10000)
bar(count_data)
hline!([mean(s[:λ₁])])

Hi

thanks for the answer here, but it is still a bit “strange” to the eyes of a computer scientists.

First of all the y appears “passed in” and actually modified in place. Next the for loop is executed exactly when? At initialization of the model? (Is the filldist actually the same as the for loop?)

What would count_data contain after the sampling of the posterior is done?

Sorry for being obdurate, but things are a bit confusing to me.

MA

The tilde symbol indicates that y is distributed according to a Poisson distribution. As you provide y to the function as a function argument, all values for y are observed and thus Turing will condition on those values.

count_data will therefore not be changed but conditioned on.

filldist is a function to simplify a for loop, in case the distribution is always the same. Think of fill in Julia. filldist is the analog for distributions. This can be more performant than a for loop (this is mostly because of how AD works in Julia) but it is not strictly necessary to use.

Than you for the reply…

And yet, I believe that given the following (which is a procedural “abstraction/notion/tool/thing”) makes things look kinda funny.

If, as you say, y is passed as a function argument and as such it is “observed” and Turing will condition on it. If it is an observation, the “declaration” that each of its element are distributed as a Poisson conflates, IMHO, two things. I am sorry but the unlabelled mixing of declarative style and the procedural one is confusing; at least to me, and I do not thing it is much improved by the use of filldist.

At a minimum, it should be explained in a better way.

All the best
Marco

Why do you feel that the declaration that y is Poisson distributed is confusing? Maybe I can help clear out the confusion. :slight_smile:

As mentioned, filldist is chosen to be analog to fill, which is a Julia base function, and does not have to be used. It’s main relevance is due to reverse mode AD, which is Turing unrelated.

You can read more about the use of filldist here: https://turing.ml/dev/docs/using-turing/performancetips (see section Ensure that types in your model can be inferred) if you are interested in scenarios where filldist helps to simplify the implementation.