Turing package(s) + parameter estimation for DiffEq models?

I also ran it four times, with even more variation. And I think the same happened when i ran 10_000 iterations. I’ll check it out again, though. Maybe I’m messing it up. But if the results are quite different when ran, say, 5 or 10 times, it makes me wonder how the results can be trusted. But again: I may be doing something wrong with initiation.

Well, it’ll have to wait until tomorrow.

It might be caused by variation in the step size. I’ll pull in @Kai_Xu for his opinion

I may have “stumbled” across the solution…

I re-ran the code today with a slight modification… Yesterday, when I ran the code, I did the following each time I ran the code:

bayes_est = turing_inference(prob,Tsit5(),tvec,data_lv,priors,num_samples=10_000);

BUT today I did the following:

priors = [truncated(Normal(1.5,0.5),0.5,2.5),truncated(Normal(1.2,0.5),0,2),
    truncated(Normal(3.0,0.5),1,4),truncated(Normal(1.0,0.5),0,2)]
#
bayes_est = turing_inference(prob,Tsit5(),tvec,data_lv,priors,num_samples=10_000);

*Could it be that function turing_inference modifies the priors, and that it is important to reset the priors before doing the re-estimation?

… but no. When re-running it, I get the following:


In some of these runs, I got loads of warnings – about numerical inaccuracies. I also saw cases where turing_inference chose values which made the system unstable.

I would recommend using the @model based approach as discussed in the tutorial, since DifEqBayes is not very actively maintained anymore it would be easier for us to diagnose if there is something going wrong.

I still do think this variation comes up due to the determination of step size, can you check the step sizes for the chains from the different runs?

bayes_est.value.data[:,10,1] should work

1 Like

OK – the tutorial on Turing seems to work ok (although I get different results each time I run the fitting).

NEXT project – try to use Turing when I only measure a subset of the states (SIR model + data for I – the number of infected). In my initial attempt, I get an error message of inconsistent array dimensions…

Here is my attempt. First the experimental data

N = 763
tm = 3:14
Id = [25, 75, 227, 296, 258, 236, 192, 126, 71, 28, 11, 7]

The deterministic model:

# Function for deterministic SIR model
function sir_expanded!(dx,x,p,t,N)
    S,I,R = x
    tau_i,tau_r = p
    #
    dS = -I*S/tau_i/N
    dI = I*S/tau_i/N - I/tau_r
    dR = I/tau_r
    dx .= [dS,dI,dR]
end
#
sir! = (dx,x,p,t) -> sir_expanded!(dx,x,p,t,N)

Next, initial values, parameters, etc. + problem solving:

I3 = Float64(Id[1])
S3 = N-I3
R3 = 0.
x3 = [S3,I3,R3]
#
tau_i = 0.553
tau_r = 2.15
RR0 = round(tau_r/tau_i,digits=1)
p = [tau_i, tau_r]
#
tspan=(3.,14.)
#
prob = ODEProblem(sir!,x3,tspan,p)
sol = solve(prob)

[I can plot the data + simulation results:

fg = plot(sol,lw=LW1,lc=[:blue :red :green],label=["\$S, \\mathsf{R}_0=$RR0\$" "\$I, \\mathsf{R}_0=$RR0\$" "\$R, \\mathsf{R}_0=$RR0\$"])
scatter!(fg,tm, Id, st=:scatter,ms=MS1,mc=:red,label=L"I^\mathrm{d}")
plot!(xlabel="time [day]",ylabel="# boys",title="Boarding school measles infection")

]

NOW, I try to use Turing to solve the problem (the only new package I import is via using Turing):

Turing.setadbackend(:forwarddiff)
#
@model function fit_sir(I_data)
    V ~ InverseGamma(2,3)
    tau_i ~ truncated(Normal(0.5,0.25),0.2,1)
    tau_r ~ truncated(Normal(2,0.5),0.5,10)
    #
    p = [tau_i,tau_r]
    prob = ODEProblem(sir!, x3, tspan, p)
    I_pred = solve(prob, Tsit5(), saveat=1., save_idxs=[2])
    #
    for i in 1:length(I_pred)
        I_data[:,i] ~ MvNormal(I_pred[i], V)
    end
end
#
model = fit_sir(Id)

Finally, I run the sample method to do the parameter estimation:

p_est = sample(model,NUTS(.65),10_000)

This leads to an error message:

DimensionMismatch("Inconsistent array dimensions.")

Stacktrace:
 [1] logpdf(::MvNormal{Float64,PDMats.ScalMat{Float64},Array{Float64,1}}, ::Array{Int64,1}) at C:\Users\Bernt_Lie\.julia\packages\Distributions\jFoHB\src\multivariates.jl:193
 [2] observe at C:\Users\...

As far as I can see, the only real difference compared to the tutorial Lotka Volterra model, are: (A) I attempt to specify that only a subset of states are available in the data array (save_idxs), and (B) I have only imported the Turing package, not all the other packages (DiffEqBayes, etc.).

Question: what is it that I do incorrectly?

I is just a scalar, so just use Normal

1 Like

Arrgh… :nerd_face: – of course!

I change I_data[:,i] ~ MvNormal(I_pred[i], V) to I_data[i] ~ Normal(I_pred[i][1], V).

[A minor problem… in MvNormal, the second argument is covariance, while in Normal, the second argument is standard deviation – kind of weird… I’ve seen in Wikipedia that InverseGamma is a common assumption for variance estimation. But what would a similar distribution for standard deviation be?]

Result:

@Vaibhavdixit02 Is the problem of the tutorial related to the recent reported typo in the notebook? I remember it was pointed out that the order of parameters in p is not consistent for data generation and modelling. I wonder if it’s possible to pass something more robust, e.g. a named tuple, to make prevent us and users from making such mistakes.

A few “problems” with the tutorial.

  • Confusing order of parameters is not helpful, but not a disaster
  • I think it would be useful to explain how to typeset symbol ~… there are many ways to express a similar symbol, e.g., via LaTeX code \sim. But the code only works when the ~ symbol on the keyboard is used.
  • The key problem is that each time I run the Bayes estimation/MCMC, I get different results. Not a big problem if the results are almost the same each time, but sometimes I get very, very different results, and that will cause the student to get suspicious of how well the methods work – at least if this is not commented upon.
  • A large number of packages are introduced, and it is not clear from the tutorial which package is used where. For a learner, it would probably be useful to include the namespace of the package in the function call, e.g., Turing.sample instead of just sample. This is useful to help the student build up an ability to navigate in the package system.
  • Is the DiffEqBayes package/tool eventually going to be phased out, and be replaced by the @model macro of the Turing package? If so, perhaps it is good to simplify the tutorial.
  • Perhaps it would be helpful to explain a little bit more about the arguments of sample? What is NUTS? If I choose “iterations” (or realizations or whatever) equal to 10_000 – why does the result only have 9_000 iterations? Etc.
  • Some discussion of the data structure of the result would be useful.
  • A “problem” with the Distributions.jl package is that the Normal distribution uses standard deviation as second argument, while the MvNormal distribution uses covariance matrix as second argument. Normally, \sigma denotes standard deviation, but in the tutorial it is used to denote variance.
  • I figured out by some googling that the Inverse Gamma distribution is often used in Bayes estimation as a prior for (co-)variance, e.g., the observation variance (which is denoted \sigma in the tutorial). These things could be mentioned in a tutorial – at least if the tutorial is meant for people who are not experts in MCMC.
  • And what if one has a scalar system? In that case, the second argument of Normal is standard deviation, and not variance. What is a suitable prior distribution for standard deviation? Probably not Inverse Gamma??

Anyway, the current version of the tutorial is good, but I think it can be improved without making it longer. At least for a person with my background :thinking:.

Thanks for the feedbacks. I’m working on improve the tutorial according to your comments as well as some others we got from Slack recently.

Re. wrong ordering & robust sampling

  • What I mean by “ordering is not consistent” is that in data simulation p = [α, β, δ, γ] was used but for modelling p = [α, β, γ, δ] is used.
  • This effectively makes the prior very bad for both δ and γ, and I highly doubt it’s why the sampling look very sensitive to initialisation.

Re. ~ v.s. \sim

  • That’s a fair point. I think we should have it somewhere at least to say it’s not \sim.
  • Although for most users the ~ character is commonly used (e.g. like we do in cd ~ in terminals)
  • And also I don’t think in general packages tend to output LaTeX Unicode symbol to users as not all users know you can do that in Julia.

Turing’s @model is always there and Turing itself is a general-purpose probabilistic programming system. DiffEqBayes.jl is supposed to provide an interface so that users in the DiffEq ecosystem can do Bayesian inference with minimal effort. I’m not sure what’s the current status of it. Maybe @Vaibhavdixit02 or @ChrisRackauckas can say a word for it.

Re. keyword arguments

  • Surely adding a comment on the meaning would be helpful for people reading it - I will add them.
  • Just FYI 10_000 is the total number of iterations out of which 1_000 is used for adaptation and discarded for analysis (by default).

What do you mean by this? You mean the returned chain?

You can do something like

σ ~ InverseGamma(2, 3)
x ~ Normal(μ, sqrt(σ))
1 Like

We’ll keep it around to make benchmarking against Flux easier, but the tools have evolved to the point that direct use of Turing is probably preferred.

1 Like

See below:

So the variable you denoted chain, in my example I denoted it p_est. If I do plot(p_est), I get a pre-digested plotting of the results. But what if I want to do some computations myself?

  1. I can see in the screenshot above that the parameters are V, tau_i, tau_r.
  • I can now extract the iterations by p_est[:V], p_est[:tau_i], and p_est[:tau_r], but this is not easy to guess. Perhaps it could have been suggested.
  • And… is there a way to extract the list of parameters from p_est? Somehing like p_est.parameters (probaby not…), or something?
  1. Based on the extracted parameter iterations above, I can do lots of things, e.g., produce correlation plots, etc., etc.
  • Is there a way to extract more information, e.g., the mean, the standard deviation, quantiles, ess, etc.? Of course, these can be computed from the extracted parameter iterations in point 1 above, but since they already have been computed, perhaps they can be extracted from the p_est (or: chain) struct?

Ah. I thought that might be the way to do it, but then I thought it might be too simple :- :stuck_out_tongue:

The analysis tools are supported by https://github.com/TuringLang/MCMCChains.jl. I would add a sentence in the tutorial to refer readers to there for details.

1 Like

@BLI I would encourage you to take a look at some changes already done in https://github.com/arnaudmgh/TuringTutorials/blob/master/10_diffeq.ipynb and add your recommendations there.

@Kai_Xu I don’t think the issue behind seeing variation in results is due to the parameter ordering issue. @BLI did you compare the step size values for runs that give you very different estimates?

1 Like

@Kai_Xu I don’t think the issue behind seeing variation in results is due to the parameter ordering issue.

I think you are right. We in fact rename δ and γ in the model def so that the effect cancels out…

@BLI did you compare the step size values for runs that give you very different estimates?

I would say a smaller acceptance ratio might be the solution. Last time I found this works for someone esle reporting instability of the tutorial on Slack. I’m updating this in the notebook as well.

Hm… I don’t know how to do that.

I mentioned it here