# Some question about the structure of Turing models

Hello,
given a Turing model

``````@model function λ(y)
λ₁ ~ Exponential(alpha)
for i in 1:length(y)
y[i] ~ Poisson(λ₁)
end
end

``````

what is the meaning of the y array?
I think it is used to compute the posterior.
When I call a sampler, y will be the data. So in the model strucutre the data is rewrite by the distribution?
Question 0: How does it works exactly?
Question 1: How can I plot the y distribution?
Calling plot(sample(myModel(data), sampler(), n)) does not plot y.
Using PyMC3 the posterior distribution can be plotted as described here:

(figure at “4. Plot the artificial dataset:”)

Question 2: Can I use map/broadcast instead of

``````    for i in 1:length(y)
y[i] ~ Poisson(λ₁)
end
``````

?

Question 0 :
What Turing does is : given the data `y` it tries to infer the posterior of `λ₁`, to do that it is going to sample from the posterior using your sampler.
Question 1:
The chain obtained by `sample` only contains the unknown variables (here `λ₁`). You would have to create a separate plot for `y`. For example by using the `histogram` function from `Plots.jl`
Question 2:
I think you can use `y ~ filldist(Poisson(λ₁), length(y))`

1 Like

Thank you for answering.

What Turing does is : given the data `y` it tries to infer the posterior of `λ₁` , to do that it is going to
sample from the posterior using your sampler.
So in Turing assign with ~ to the data array(s) a distribution means “infer the posterior of the used variables”; is correct? How Turing use the data to compute the infence?

For example by using the `histogram` function from `Plots.jl`

``````import Plots
s=e = sample(λ(count_data), HMC(0.01, 10), 10000)
histogram(s)
``````

It plots again only unknown variables.

I meant to use `histogram` on `y`. Since `y` is your data, it’s fixed, you don’t have to sample from it.

So in Turing assign with ~ to the data array(s) a distribution means “infer the posterior of the used variables”; is correct? How Turing use the data to compute the infence?

~ assigns a distribution to a random variable (eventually conditioned on data or other variables). What sampling does is returning samples of the posterior on your random variables, in your case p(lambda_1 | y)

I would like to plot the excepted data not my data. (The red line in this plot)
Example in PyMC3:

I see what you want now but this not what your model would be doing.
I guess the easiest to reproduce this plot would be something like :

``````s = sample(λ(count_data), HMC(0.01, 10), 10000)
bar(count_data)
hline!([mean(s[:λ₁])])
``````

Hi

thanks for the answer here, but it is still a bit “strange” to the eyes of a computer scientists.

First of all the `y` appears “passed in” and actually modified in place. Next the `for` loop is executed exactly when? At initialization of the model? (Is the `filldist` actually the same as the `for` loop?)

What would `count_data` contain after the sampling of the posterior is done?

Sorry for being obdurate, but things are a bit confusing to me.

MA

The tilde symbol indicates that y is distributed according to a Poisson distribution. As you provide y to the function as a function argument, all values for y are observed and thus Turing will condition on those values.

count_data will therefore not be changed but conditioned on.

filldist is a function to simplify a for loop, in case the distribution is always the same. Think of fill in Julia. filldist is the analog for distributions. This can be more performant than a for loop (this is mostly because of how AD works in Julia) but it is not strictly necessary to use.

Than you for the reply…

And yet, I believe that given the following (which is a procedural “abstraction/notion/tool/thing”) makes things look kinda funny.

If, as you say, `y` is passed as a function argument and as such it is “observed” and Turing will condition on it. If it is an observation, the “declaration” that each of its element are distributed as a Poisson conflates, IMHO, two things. I am sorry but the unlabelled mixing of declarative style and the procedural one is confusing; at least to me, and I do not thing it is much improved by the use of `filldist`.

At a minimum, it should be explained in a better way.

All the best
Marco

Why do you feel that the declaration that y is Poisson distributed is confusing? Maybe I can help clear out the confusion.

As mentioned, filldist is chosen to be analog to fill, which is a Julia base function, and does not have to be used. It’s main relevance is due to reverse mode AD, which is Turing unrelated.

You can read more about the use of filldist here: https://turing.ml/dev/docs/using-turing/performancetips (see section Ensure that types in your model can be inferred) if you are interested in scenarios where filldist helps to simplify the implementation.

Thank you Martin for your reply.

Let me answer to the two questions separately. Before that let me also make a note about the documentation and the tutorials.

I am notoriously lazy, but in order to understand exactly a `model` is required me a thorough poring over the tutorials. OK, I know I should RTFM, but then again some initial explanations would be welcome in the ‘Getting Started’ section and open questions remain; e.g., is the expr in a `model` always a `function`?

But let’s go back to the `for` loop where you “declare” that each of the vector components are distributed Poisson, Bernoulli, Normal or anything else.

IMHO, a “declaration” should not be built using something as procedural as a `for` loop; even conceding that the order of declarations is important. Of course, I did not designed Turing, but I am just commenting on it.

The same applies to the use of `filldist`; IMHO, it still confuses the regular computer scientist guy, like me, as it is again a non-declarative invocation of a library function.

In other words, I think that within the `model` macro some other local macro could be set up to declare that a name denotes a vector of variables with a given distribution.

Not sure what to propose, maybe something as simple as

``````y[] ~ Bernoulli(x)
``````

may work and indicate that `y`is going to be a vector. Macrology can surely give you this.

Just my 2 cents.

All the best

Marco

The documentation could definitely be improved. To answer your question, the `model` macro only acts on function definitions, so yes the expression after `@model` always has to be a function. I’m not sure we mention that anywhere, that’s a good point.

That’s an interesting point. I’ll cc: @mohamed82008 to the discussion. It would be interesting to move even closer to a mathematical notation of the process, e.g.

``````@model function test(y)
λ ~ Exponential(α)
y[i] ~ Bernoulli(λ)
end
``````

(again without the use of a for loop).

Or a more complex example:

``````@model function test(y; K=2, V=3, μ₀ = 0.0, σ₀ = 10.0)
α ~ Gamma()
β ~ Gamma()

ν ~ Dirichlet(V, β/V)
c[d] ~ Categorical(ν)

w[v] ~ Dirichlet(K, α/K) # 1 < v <= V
z[v,i] ~ Categorical(w[v])

μ[v,k] ~ Normal(μ₀, σ₀) # 1 < v <= V, 1 < k <= K

y[d,i] ~ Normal(μ[c[d], z[c[d],i]], 0.5)
end
``````

but as we see this get’s a bit more difficult to declare and more elaborate models might be even more challenging.

Maybe independent of those ideas, to understand why we don’t do this atm. The main reason for this is that wanted Turing to act on standard Julia functions and only add minimal changes to the code written by the user. Thus, making it easier to debug the code, optimise it by hand and integrate arbitrary Julia packages. Those points will potentially get more difficult with approaches as outlined above.

In Soss this would be

``````test = @model n begin
λ ~ Exponential(α)
y ~ Bernoulli(λ) |> iid(n)
end
``````

or

``````test = @model n begin
λ ~ Exponential(α)
y ~ For(n) do j Bernoulli(λ) end
end
``````

But there’s really nothing Soss-specific about `iid` and `For`; in principle it should be quick to get this working in Turing:

``````@model function test(y)
n = length(y)
λ ~ Exponential(α)
y ~ Bernoulli(λ) |> iid(n)
end
``````

How would you write the second model?

What if `i` has a value and we actually want to do a time-series style model? It’s almost impossible to parse this correctly as suggested here without an extra hint, e.g. the `@einsum` notation from the tensor packages (e.g. https://github.com/ahwillia/Einsum.jl). But then should we create a new variable `y` or mutate an existing variable `y`? This is still hard to determine correctly at parse-time but not impossible. For example, we can see if `y` has been defined or not. If not, we define it. Since `~` can only be used with variables that are usually defined inside the model for AD compatability or observations from the arguments, we have almost all the information we need at the macro-level. We would just need to assume that `y` isn’t obscurely defined using a macro, which is rare so that maybe an acceptable tradeoff.

Tbh, I think it doesn’t get much cleaner than the `Soss` syntax here. So maybe we can just have `iid` do something like `filldist` under the hood, all the AD efficiency tricks I mean. But the tensor notation idea can also be neat.

2 Likes

This one?

``````@model function test(y; K=2, V=3, μ₀ = 0.0, σ₀ = 10.0)
α ~ Gamma()
β ~ Gamma()

ν ~ Dirichlet(V, β/V)
c[d] ~ Categorical(ν)

w[v] ~ Dirichlet(K, α/K) # 1 < v <= V
z[v,i] ~ Categorical(w[v])

μ[v,k] ~ Normal(μ₀, σ₀) # 1 < v <= V, 1 < k <= K

y[d,i] ~ Normal(μ[c[d], z[c[d],i]], 0.5)
end
``````

I guess it would be something like

``````test = @model D,I,K,V, μ₀, σ₀ begin
α ~ Gamma()
β ~ Gamma()

ν ~ Dirichlet(V, β/V)
c ~ Categorical(ν) |> iid(D)

w ~ Dirichlet(K, α/K) |> iid(V)
z ~ For(V,I) do v,i
Categorical(w[v])
end

μ ~ Normal(μ₀, σ₀) |> iid(V,K)

y ~ For(D,I) do d,i
Normal(μ[c[d], z[c[d],i]], 0.5)
end
end
``````

Yes, that’s what I was try to hint to in the second example. And yes, when writing it I was thinking about the Einsum package.

1 Like

Oh, I just thought it would be interesting to consider and see what are the issues related to it, and there are plenty.

Yes, the iid syntax would be great to have. @cscherrer would it be possible to integrate that into Turing?

Do you need your tailored For function here?