Some question about the structure of Turing models

mcql · October 12, 2020, 11:17am

Hello,
given a Turing model

@model function λ(y)
    λ₁ ~ Exponential(alpha)
    for i in 1:length(y)
        y[i] ~ Poisson(λ₁)
    end
end

what is the meaning of the y array?
I think it is used to compute the posterior.
When I call a sampler, y will be the data. So in the model strucutre the data is rewrite by the distribution?
Question 0: How does it works exactly?
Question 1: How can I plot the y distribution?
Calling plot(sample(myModel(data), sampler(), n)) does not plot y.
Using PyMC3 the posterior distribution can be plotted as described here:

(figure at “4. Plot the artificial dataset:”)

Question 2: Can I use map/broadcast instead of

    for i in 1:length(y)
        y[i] ~ Poisson(λ₁)
    end

?

theogf · October 12, 2020, 11:36am

Question 0 :
What Turing does is : given the data y it tries to infer the posterior of λ₁, to do that it is going to sample from the posterior using your sampler.
Question 1:
The chain obtained by sample only contains the unknown variables (here λ₁). You would have to create a separate plot for y. For example by using the histogram function from Plots.jl
Question 2:
I think you can use y ~ filldist(Poisson(λ₁), length(y))

mcql · October 12, 2020, 1:05pm

Thank you for answering.

What Turing does is : given the data y it tries to infer the posterior of λ₁ , to do that it is going to
sample from the posterior using your sampler.
So in Turing assign with ~ to the data array(s) a distribution means “infer the posterior of the used variables”; is correct? How Turing use the data to compute the infence?

For example by using the histogram function from Plots.jl

import Plots
s=e = sample(λ(count_data), HMC(0.01, 10), 10000)
histogram(s)

It plots again only unknown variables.

theogf · October 12, 2020, 1:09pm

I meant to use histogram on y. Since y is your data, it’s fixed, you don’t have to sample from it.

So in Turing assign with ~ to the data array(s) a distribution means “infer the posterior of the used variables”; is correct? How Turing use the data to compute the infence?

~ assigns a distribution to a random variable (eventually conditioned on data or other variables). What sampling does is returning samples of the posterior on your random variables, in your case p(lambda_1 | y)

mcql · October 12, 2020, 2:14pm

I would like to plot the excepted data not my data. (The red line in this plot)
Example in PyMC3:

theogf · October 12, 2020, 2:30pm

I see what you want now but this not what your model would be doing.
I guess the easiest to reproduce this plot would be something like :

s = sample(λ(count_data), HMC(0.01, 10), 10000)
bar(count_data)
hline!([mean(s[:λ₁])])

Marco_Antoniotti · October 15, 2020, 7:03pm

Hi

thanks for the answer here, but it is still a bit “strange” to the eyes of a computer scientists.

First of all the y appears “passed in” and actually modified in place. Next the for loop is executed exactly when? At initialization of the model? (Is the filldist actually the same as the for loop?)

What would count_data contain after the sampling of the posterior is done?

Sorry for being obdurate, but things are a bit confusing to me.

MA

trappmartin · October 23, 2020, 1:53pm

The tilde symbol indicates that y is distributed according to a Poisson distribution. As you provide y to the function as a function argument, all values for y are observed and thus Turing will condition on those values.

count_data will therefore not be changed but conditioned on.

filldist is a function to simplify a for loop, in case the distribution is always the same. Think of fill in Julia. filldist is the analog for distributions. This can be more performant than a for loop (this is mostly because of how AD works in Julia) but it is not strictly necessary to use.

Marco_Antoniotti · October 23, 2020, 3:01pm

Than you for the reply…

And yet, I believe that given the following (which is a procedural “abstraction/notion/tool/thing”) makes things look kinda funny.

If, as you say, y is passed as a function argument and as such it is “observed” and Turing will condition on it. If it is an observation, the “declaration” that each of its element are distributed as a Poisson conflates, IMHO, two things. I am sorry but the unlabelled mixing of declarative style and the procedural one is confusing; at least to me, and I do not thing it is much improved by the use of filldist.

At a minimum, it should be explained in a better way.

All the best
Marco

trappmartin · October 25, 2020, 10:07pm

Why do you feel that the declaration that y is Poisson distributed is confusing? Maybe I can help clear out the confusion.

trappmartin · October 25, 2020, 10:11pm

As mentioned, filldist is chosen to be analog to fill, which is a Julia base function, and does not have to be used. It’s main relevance is due to reverse mode AD, which is Turing unrelated.

You can read more about the use of filldist here: Performance Tips (see section Ensure that types in your model can be inferred) if you are interested in scenarios where filldist helps to simplify the implementation.

Marco_Antoniotti · October 31, 2020, 3:15pm

Thank you Martin for your reply.

Let me answer to the two questions separately. Before that let me also make a note about the documentation and the tutorials.

I am notoriously lazy, but in order to understand exactly a model is required me a thorough poring over the tutorials. OK, I know I should RTFM, but then again some initial explanations would be welcome in the ‘Getting Started’ section and open questions remain; e.g., is the expr in a model always a function?

But let’s go back to the for loop where you “declare” that each of the vector components are distributed Poisson, Bernoulli, Normal or anything else.

IMHO, a “declaration” should not be built using something as procedural as a for loop; even conceding that the order of declarations is important. Of course, I did not designed Turing, but I am just commenting on it.

The same applies to the use of filldist; IMHO, it still confuses the regular computer scientist guy, like me, as it is again a non-declarative invocation of a library function.

In other words, I think that within the model macro some other local macro could be set up to declare that a name denotes a vector of variables with a given distribution.

Not sure what to propose, maybe something as simple as

y[] ~ Bernoulli(x)

may work and indicate that yis going to be a vector. Macrology can surely give you this.

Just my 2 cents.

All the best

Marco

trappmartin · November 2, 2020, 3:13pm

The documentation could definitely be improved. To answer your question, the model macro only acts on function definitions, so yes the expression after @model always has to be a function. I’m not sure we mention that anywhere, that’s a good point.

That’s an interesting point. I’ll cc: @mohamed82008 to the discussion. It would be interesting to move even closer to a mathematical notation of the process, e.g.

@model function test(y)
    λ ~ Exponential(α)
    y[i] ~ Bernoulli(λ)
end

(again without the use of a for loop).

Or a more complex example:

@model function test(y; K=2, V=3, μ₀ = 0.0, σ₀ = 10.0)
    α ~ Gamma()
    β ~ Gamma()

    ν ~ Dirichlet(V, β/V)
    c[d] ~ Categorical(ν)
    
    w[v] ~ Dirichlet(K, α/K) # 1 < v <= V
    z[v,i] ~ Categorical(w[v])

    μ[v,k] ~ Normal(μ₀, σ₀) # 1 < v <= V, 1 < k <= K
    
    y[d,i] ~ Normal(μ[c[d], z[c[d],i]], 0.5)
end

but as we see this get’s a bit more difficult to declare and more elaborate models might be even more challenging.

Maybe independent of those ideas, to understand why we don’t do this atm. The main reason for this is that wanted Turing to act on standard Julia functions and only add minimal changes to the code written by the user. Thus, making it easier to debug the code, optimise it by hand and integrate arbitrary Julia packages. Those points will potentially get more difficult with approaches as outlined above.

cscherrer · November 2, 2020, 3:36pm

trappmartin:

It would be interesting to move even closer to a mathematical notation of the process, e.g.
@model function test(y)
    λ ~ Exponential(α)
    y[i] ~ Bernoulli(λ)
end
(again without the use of a for loop).

In Soss this would be

test = @model n begin
    λ ~ Exponential(α)
    y ~ Bernoulli(λ) |> iid(n)
end

or

test = @model n begin
    λ ~ Exponential(α)
    y ~ For(n) do j Bernoulli(λ) end
end

But there’s really nothing Soss-specific about iid and For; in principle it should be quick to get this working in Turing:

@model function test(y)
    n = length(y)
    λ ~ Exponential(α)
    y ~ Bernoulli(λ) |> iid(n)
end

trappmartin · November 2, 2020, 4:18pm

How would you write the second model?

mohamed82008 · November 2, 2020, 4:26pm

What if i has a value and we actually want to do a time-series style model? It’s almost impossible to parse this correctly as suggested here without an extra hint, e.g. the @einsum notation from the tensor packages (e.g. GitHub - ahwillia/Einsum.jl: Einstein summation notation in Julia). But then should we create a new variable y or mutate an existing variable y? This is still hard to determine correctly at parse-time but not impossible. For example, we can see if y has been defined or not. If not, we define it. Since ~ can only be used with variables that are usually defined inside the model for AD compatability or observations from the arguments, we have almost all the information we need at the macro-level. We would just need to assume that y isn’t obscurely defined using a macro, which is rare so that maybe an acceptable tradeoff.

Tbh, I think it doesn’t get much cleaner than the Soss syntax here. So maybe we can just have iid do something like filldist under the hood, all the AD efficiency tricks I mean. But the tensor notation idea can also be neat.

cscherrer · November 2, 2020, 4:35pm

This one?

@model function test(y; K=2, V=3, μ₀ = 0.0, σ₀ = 10.0)
    α ~ Gamma()
    β ~ Gamma()

    ν ~ Dirichlet(V, β/V)
    c[d] ~ Categorical(ν)
    
    w[v] ~ Dirichlet(K, α/K) # 1 < v <= V
    z[v,i] ~ Categorical(w[v])

    μ[v,k] ~ Normal(μ₀, σ₀) # 1 < v <= V, 1 < k <= K
    
    y[d,i] ~ Normal(μ[c[d], z[c[d],i]], 0.5)
end

I guess it would be something like

test = @model D,I,K,V, μ₀, σ₀ begin
    α ~ Gamma()
    β ~ Gamma()

    ν ~ Dirichlet(V, β/V) 
    c ~ Categorical(ν) |> iid(D)
    
    w ~ Dirichlet(K, α/K) |> iid(V)
    z ~ For(V,I) do v,i
                Categorical(w[v])
             end

    μ ~ Normal(μ₀, σ₀) |> iid(V,K)
    
    y ~ For(D,I) do d,i
                 Normal(μ[c[d], z[c[d],i]], 0.5)
             end
end

trappmartin · November 2, 2020, 4:44pm

Yes, that’s what I was try to hint to in the second example. And yes, when writing it I was thinking about the Einsum package.

trappmartin · November 2, 2020, 4:46pm

Oh, I just thought it would be interesting to consider and see what are the issues related to it, and there are plenty.

Yes, the iid syntax would be great to have. @cscherrer would it be possible to integrate that into Turing?

trappmartin · November 2, 2020, 4:47pm

Do you need your tailored For function here?

Topic		Replies	Views
How to specify a Turing model to have the 2 vectors as input for a distribution that returns a tuple Probabilistic Programming turing , distributions , lba , sequantial-models	19	442	July 1, 2023
Making Turing Fast with large numbers of parameters? General Usage turing	127	9569	April 9, 2023
Parameter Fitting in Turing Probabilistic Programming turing	15	1434	October 12, 2022
Trouble translating a simple PyMC3 model to Turing New to Julia turing	39	5090	November 23, 2021
Turing package(s) + parameter estimation for DiffEq models? New to Julia	38	1488	August 7, 2020

Some question about the structure of Turing models

Related topics