@model function λ(y)
λ₁ ~ Exponential(alpha)
for i in 1:length(y)
y[i] ~ Poisson(λ₁)
end
end
what is the meaning of the y array?
I think it is used to compute the posterior.
When I call a sampler, y will be the data. So in the model strucutre the data is rewrite by the distribution? Question 0: How does it works exactly? Question 1: How can I plot the y distribution?
Calling plot(sample(myModel(data), sampler(), n)) does not plot y.
Using PyMC3 the posterior distribution can be plotted as described here:
Question 0 :
What Turing does is : given the data y it tries to infer the posterior of λ₁, to do that it is going to sample from the posterior using your sampler.
Question 1:
The chain obtained by sample only contains the unknown variables (here λ₁). You would have to create a separate plot for y. For example by using the histogram function from Plots.jl
Question 2:
I think you can use y ~ filldist(Poisson(λ₁), length(y))
What Turing does is : given the data y it tries to infer the posterior of λ₁ , to do that it is going to
sample from the posterior using your sampler.
So in Turing assign with ~ to the data array(s) a distribution means “infer the posterior of the used variables”; is correct? How Turing use the data to compute the infence?
For example by using the histogram function from Plots.jl
I meant to use histogram on y. Since y is your data, it’s fixed, you don’t have to sample from it.
So in Turing assign with ~ to the data array(s) a distribution means “infer the posterior of the used variables”; is correct? How Turing use the data to compute the infence?
~ assigns a distribution to a random variable (eventually conditioned on data or other variables). What sampling does is returning samples of the posterior on your random variables, in your case p(lambda_1 | y)
thanks for the answer here, but it is still a bit “strange” to the eyes of a computer scientists.
First of all the y appears “passed in” and actually modified in place. Next the for loop is executed exactly when? At initialization of the model? (Is the filldist actually the same as the for loop?)
What would count_data contain after the sampling of the posterior is done?
Sorry for being obdurate, but things are a bit confusing to me.
The tilde symbol indicates that y is distributed according to a Poisson distribution. As you provide y to the function as a function argument, all values for y are observed and thus Turing will condition on those values.
count_data will therefore not be changed but conditioned on.
filldist is a function to simplify a for loop, in case the distribution is always the same. Think of fill in Julia. filldist is the analog for distributions. This can be more performant than a for loop (this is mostly because of how AD works in Julia) but it is not strictly necessary to use.
And yet, I believe that given the following (which is a procedural “abstraction/notion/tool/thing”) makes things look kinda funny.
If, as you say, y is passed as a function argument and as such it is “observed” and Turing will condition on it. If it is an observation, the “declaration” that each of its element are distributed as a Poisson conflates, IMHO, two things. I am sorry but the unlabelled mixing of declarative style and the procedural one is confusing; at least to me, and I do not thing it is much improved by the use of filldist.
At a minimum, it should be explained in a better way.
As mentioned, filldist is chosen to be analog to fill, which is a Julia base function, and does not have to be used. It’s main relevance is due to reverse mode AD, which is Turing unrelated.
You can read more about the use of filldist here: Performance Tips (see section Ensure that types in your model can be inferred) if you are interested in scenarios where filldist helps to simplify the implementation.
Let me answer to the two questions separately. Before that let me also make a note about the documentation and the tutorials.
I am notoriously lazy, but in order to understand exactly a model is required me a thorough poring over the tutorials. OK, I know I should RTFM, but then again some initial explanations would be welcome in the ‘Getting Started’ section and open questions remain; e.g., is the expr in a model always a function?
But let’s go back to the for loop where you “declare” that each of the vector components are distributed Poisson, Bernoulli, Normal or anything else.
IMHO, a “declaration” should not be built using something as procedural as a for loop; even conceding that the order of declarations is important. Of course, I did not designed Turing, but I am just commenting on it.
The same applies to the use of filldist; IMHO, it still confuses the regular computer scientist guy, like me, as it is again a non-declarative invocation of a library function.
In other words, I think that within the model macro some other local macro could be set up to declare that a name denotes a vector of variables with a given distribution.
Not sure what to propose, maybe something as simple as
y[] ~ Bernoulli(x)
may work and indicate that yis going to be a vector. Macrology can surely give you this.
The documentation could definitely be improved. To answer your question, the model macro only acts on function definitions, so yes the expression after @model always has to be a function. I’m not sure we mention that anywhere, that’s a good point.
That’s an interesting point. I’ll cc: @mohamed82008 to the discussion. It would be interesting to move even closer to a mathematical notation of the process, e.g.
@model function test(y)
λ ~ Exponential(α)
y[i] ~ Bernoulli(λ)
end
(again without the use of a for loop).
Or a more complex example:
@model function test(y; K=2, V=3, μ₀ = 0.0, σ₀ = 10.0)
α ~ Gamma()
β ~ Gamma()
ν ~ Dirichlet(V, β/V)
c[d] ~ Categorical(ν)
w[v] ~ Dirichlet(K, α/K) # 1 < v <= V
z[v,i] ~ Categorical(w[v])
μ[v,k] ~ Normal(μ₀, σ₀) # 1 < v <= V, 1 < k <= K
y[d,i] ~ Normal(μ[c[d], z[c[d],i]], 0.5)
end
but as we see this get’s a bit more difficult to declare and more elaborate models might be even more challenging.
Maybe independent of those ideas, to understand why we don’t do this atm. The main reason for this is that wanted Turing to act on standard Julia functions and only add minimal changes to the code written by the user. Thus, making it easier to debug the code, optimise it by hand and integrate arbitrary Julia packages. Those points will potentially get more difficult with approaches as outlined above.
What if i has a value and we actually want to do a time-series style model? It’s almost impossible to parse this correctly as suggested here without an extra hint, e.g. the @einsum notation from the tensor packages (e.g. GitHub - ahwillia/Einsum.jl: Einstein summation notation in Julia). But then should we create a new variable y or mutate an existing variable y? This is still hard to determine correctly at parse-time but not impossible. For example, we can see if y has been defined or not. If not, we define it. Since ~ can only be used with variables that are usually defined inside the model for AD compatability or observations from the arguments, we have almost all the information we need at the macro-level. We would just need to assume that y isn’t obscurely defined using a macro, which is rare so that maybe an acceptable tradeoff.
Tbh, I think it doesn’t get much cleaner than the Soss syntax here. So maybe we can just have iid do something like filldist under the hood, all the AD efficiency tricks I mean. But the tensor notation idea can also be neat.
@model function test(y; K=2, V=3, μ₀ = 0.0, σ₀ = 10.0)
α ~ Gamma()
β ~ Gamma()
ν ~ Dirichlet(V, β/V)
c[d] ~ Categorical(ν)
w[v] ~ Dirichlet(K, α/K) # 1 < v <= V
z[v,i] ~ Categorical(w[v])
μ[v,k] ~ Normal(μ₀, σ₀) # 1 < v <= V, 1 < k <= K
y[d,i] ~ Normal(μ[c[d], z[c[d],i]], 0.5)
end
I guess it would be something like
test = @model D,I,K,V, μ₀, σ₀ begin
α ~ Gamma()
β ~ Gamma()
ν ~ Dirichlet(V, β/V)
c ~ Categorical(ν) |> iid(D)
w ~ Dirichlet(K, α/K) |> iid(V)
z ~ For(V,I) do v,i
Categorical(w[v])
end
μ ~ Normal(μ₀, σ₀) |> iid(V,K)
y ~ For(D,I) do d,i
Normal(μ[c[d], z[c[d],i]], 0.5)
end
end