Parameter estimation problem in Turing

Hi,
I am new to Turing PPL.
I have a random variable X which is Gamma distributed, i.e., X ~Ga(α, β). And the scale β parameter of X is also obeys a Gamma law, namely, β ~ Ga(α0, β0). Then I want to infer the above parameters from the observed data of X. I was thinking the parameters are α, α0, β0.
I implemented the following solution based on the NUTS sampler in Turing.

using Turing, MCMCChains, Distributions
using StatsPlots
# construct the probabilistic model
@model function rdeg(; N::Int)
    # define prior
    α ~ Gamma(0.1, 0.2)
    α0 ~ Gamma(0.1, 0.2)
    β0 ~ Gamma(0.1, 0.2)

    # Likelihood
    β ~ Gamma(α0, β0)
    y ~ filldist(Gamma(α, β), N)
    # y ~ filldist(Gamma(α, Gamma(α0, β0)), N)
    return y

end
rdeg(y::AbstractVector{<:Real}) = rdeg(; N=length(y)) | (; y);

# collect data
data = Float64[]
for i in 1:120
    β = rand(Gamma(0.6, 0.04))
    push!(data, rand(Gamma(0.4, β)))
end
model = rdeg(data)
# sampler = HMC(0.09, 15)
sampler = NUTS() # a nice sampler
chain = sample(model, sampler, MCMCSerial(), 10_000, 4, progress=false)

As seen in above program, I have set α, α0, β0 = 0.4, 0.6, 0.04; And I got output of:


First, I see the estimation of α, α0, β0 are not so accurate, α is nearly Ok, but α0, β0 are not.
The second problem is I am not sure if my implementation of inference model in @model block is correct.
I see in the results, the parameters are four: α, β, α0, β0.

Thank you very much for checking my problems!

Hi again,

Try to make a plot of the chains, that’s usually very useful for understanding your results.

1 Like

Thank you for the suggestion!
I will add the visualization for the chain when doing MCMC.

At first glance, the model you have written is different from the generative process. In the latter, each observation y_i has its own \beta_i associated. While in the model, one \beta is sampled which informs all y .

Instead of one \beta, I believe you want something like

 β ~ filldist(Gamma(α0, β0), N)
 for i in 1:N
   y[i] ~ Gamma(α,β[i])
 end

which for 1000 samples along one chain yields for me

Summary Statistics
  parameters      mean       std   naive_se      mcse        ess      rhat   ess_per_sec
      Symbol   Float64   Float64    Float64   Float64    Float64   Float64       Float64

           α    0.4783    0.1643     0.0052    0.0232    26.4629    1.0126        0.9557
          α0    0.5508    0.2129     0.0067    0.0302    25.4590    1.0162        0.9194
          β0    0.0467    0.0191     0.0006    0.0016   121.8532    1.0064        4.4006

That’s not too bad I’d say.

1 Like

@skleinbo Thank you very much for providing this solution.
Yes, your implementation of the model is exactly what I want.
I will further develop based on your solution.

1 Like

You may probably also want to widen the priors (which I did!). For example, the prior for \alpha\sim\text{Gamma}(0.1,0.2) has P(0.3<\alpha<0.4) \approx 0.005.

1 Like

Hi, again. Can you help to provide the whole example of your implementation?
I just run into some error saying:

Output exceeds the [size limit](command:workbench.action.openSettings?[). Open the full output data [in a text editor](command:workbench.action.openLargeOutput?74549bc6-1cc2-4911-a2b2-89cd237c0afe)
BoundsError: attempt to access 0-element Vector{Float64} at index [1]

Thank you!

Yes. I will try this suggestion.
Thank you very much!

Note that I have changed the model to accept y directly, instead of conditioning it later like you did.

julia> using Turing, MCMCChains, Distributions
       using StatsPlots
       # construct the probabilistic model
       @model function rdeg2(y)
           N = length(y)
           # define prior
           α ~ Gamma(1.0, 0.2)
           α0 ~ Gamma(1.0, 0.2)
           β0 ~ Gamma(1.0, 0.2)

           # Likelihood
           β ~ filldist(Gamma(α0, β0), N)
           for i in 1:N
             y[i] ~ Gamma(α,β[i])
           end
           return y
       end

       # collect data
       data = Float64[]
       for i in 1:120
           β = rand(Gamma(0.6, 0.04))
           push!(data, rand(Gamma(0.4, β)))
       end
       model = rdeg2(data)
       chain = sample(model, NUTS(), 1000)

It doesn’t make a difference for the inference, but if you define rdeg(;N) like you originally did, you’ll need to define

 y = Vector{Float64}(undef, N) # needs to be declared if not argument of the model
 for i in 1:N
   y[i] ~ Gamma(α,β[i])
 end
1 Like

@skleinbo Thank you very much for the solution and explanation.
There is one thing I need to ask. I run the program and got


Here the parameters include all βs?

Yes, because as I said, each observation comes from a distribution with its own \beta_i.

Had you drawn observations like

\beta = rand(Gamma(\alpha0,\beta0)
y = rand(Gamma(\alpha,\beta), 120)

your original model would be the right one.

Just for fun, you could plot a histogram of all the \beta samples from the chain(s) and compare to the generating distribution \text{Gamma}(0.6,0.04).

Thank you very much for the reply!