SIR modeling in Soss

I spent some time yesterday getting as SIR epidemiology model working in Soss. I see this a lot as a differential equation, but that doesn’t make sense for real data, because what we really observe are counts. So I used a discrete Markov chain. The transitions are “global” relative to the Markov chain, so this was an interesting design challenge.

EDIT: I should say, it doesn’t make sense to me for real data with small counts.

I ended up with this for the Markov kernel:

using Soss

mstep = @model pars,state begin
    # Parameters
    α = pars.α        # Transmission rate
    β = pars.β        # Recovery rate
    γ = pars.γ        # Case fatality rate

    # Starting counts
    s0 = state.s      # Susceptible
    i0 = state.i      # Infected 
    r0 = state.r      # Recovered
    d0 = state.d      # Deceased
    n0 = state.n       # Population size
    # Transitions between states
    si ~ Binomial(s0, α * i0 / n)
    ir ~ Binomial(i0, β)
    id ~ Binomial(i0-ir, γ)
    # Updated counts
    s = s0 - si 
    i = i0 + si - ir - id
    r = r0 + ir
    d = d0 + id
    n = n0 - id

    next = (pars=pars, state=(s=s,i=i,r=r,d=d, n=n))

This is wrapped in another model that puts priors on the rate parameters:

m = @model s0 begin
    α ~ Uniform()
    β ~ Uniform()
    γ ~ Uniform()
    pars = (α=α, β=β, γ=γ)
    x ~ MarkovChain(pars, mstep(pars=pars, state=s0))

EDIT: Benjamin Vincent caught some errors in the model and its interpretation, see here.

And here’s inference:

julia> s0 = namedtuple(df[1,:])
(i = 1, r = 0, d = 0, n = 331000000, s = 330999999, si = 0, ir = 0, id = 0)
julia> post = dynamicHMC(m(s0=s0),(x=namedtuple.(eachrow(df)),));
julia> ppost=particles(post)
(γ = 0.00423 ± 0.00011, β = 0.00234 ± 7.9e-5, α = 0.279 ± 0.00084)

The next obvious thing is to use this for a forecast. I don’t want to share this yet, as the result is really bleak. I haven’t yet done any model checking, but I expect the model is much too simple for realistic forecasting.

To draw from the prior, you can do

s0 = (i = 1, r = 0, d = 0, n = 331000000, s = 330999999);

r = rand(m(s0=s0));

for (n,s) in enumerate(r.x)
    n>10 && break

Note that I had to add a few things to get this to work. Full code is available here

EDIT: I’ve updated the model to work with the current master branch of Soss; MarkovChain is now included.


Everything is an approximation, and discrete time is also an approximation. Dropping that gives a continuous time discrete model, also known as a Gillespie model, is a form of “discrete differential equation” which limits to stochastic differential equation and an ordinary differential equation. This model is fairly widely used in epidemic modeling, and coincidentally the DiffEq documentation has been using an epidemic model in its tutorial for this:

using DiffEqBiological
sir_model = @reaction_network SIR begin
    c1, s + i --> 2i
    c2, i --> r
end c1 c2

p = (0.1/1000,0.01)
prob = DiscreteProblem([999,1,0],(0.0,250.0),p)
jump_prob = JumpProblem(prob,Direct(),sir_model)
sol = solve(jump_prob,FunctionMap())
using Plots; plot(sol)


As the copy number (the number of individuals) approaches infinity, the stochasticity in the model drops (since that grows as sqrt(N) as opposed to the mean of N), and thus it approaches the ODE behavior (which should be clear from the behavior). That’s why the ODE is widely used, because then you can do things like bifurcation analysis to predict stability of the model and do things like derive an approximating equation for the peak of the distribution.

But there are many ways to discretize the continuous time discrete equations, and it turns out your model is equivalent to the binomial leaping method on this kind of model but with a fixed dt: . Note that there’s a lot of studies on how you can then improve the model by making it implicit, adapting dt, etc. to stabilize it. So the other thing is just that differential equations (and discrete continuous-time Markov chains) are just a good place to do numerical analysis to figure out how things can go unstable and how to derive more accurate/stable methods.

That model should directly work in Turing.jl with DiffEqBayes.jl, but it would be nice to get an example together with SOSS!


I wrote a blog post about the SEIR model applied to the covid19 situation in Denmark. As a next step I would indeed like to have the rates be distributions instead to get a reasonable quantification of the uncertainty involved. Do I need DiffEqBayes for using Turing or Soss to accomplish that? I’ve never tampered with stochasticity in ODEs before. The blog post is here: and the code is written in Julia.


You don’t need DiffEqBayes to use it with Turing. If you just use concrete_solve in there even the Zygote-based autodiff will just work out fine. The DiffEqBayes is just a simplified interface for folks who are not so familiar with probabilistic programming and just want to churn out the standard Bayesian parameter fit, and it’s setup inside to always try to “do the right thing” in that case. (It’s also a very good way to continue testing that DiffEq and Turing work together).


Got it.

That’s a nice blog post. I computed some Bayesian estimates of a similar model.


Thanks for a very interesting analysis :slight_smile: I have a question if you don’t mind.

  • For italy, one of the chains (orange) seems very different from the others, does this correspond to a different mode of the posterior? If so, could there be other modes that have not been explored?

@Paul_Schrimpf, it appears that you have very bad mixing, this should be fixed before any kind of conclusion is drawn from the model (kudos for including the relevant plots to diagnose this, I wish all researchers did this).


Reply, yes there could be other modes. In general, the chains do not look great. I’m running them for more iterations now.

Agreed. Thanks for the feedback. I am working on improving it.

@baggepinnen, @Tamas_Papp After running for many more iterations, the chains still appear different from one another for Italy, Canada, and the US. @baggepinnen, I think you’re right that the posterior is simply multi-modal. In some sense this isn’t surprising — the model has many unobserved state variables, and seeing a short part of the path of the three observed variables is not enough to separately identify the parameters.

For China and Korea, where the number of active cases has begun decreasing, the chains look better and the posterior appears uni-modal. On the other hand, the model does not fit the data as well in these countries. This too is not very surprising. A model with constant transmission rates cannot explain the evolution of cases seen in those countries.


I wonder if this is the right intuition though. Usually non-identified parameters just result in “ridges” and their higher dimension equivalents, not separate modes. Model misspecification is usually a more common reason for these kind of mixing problems.

Whatever the reason may be, my point was that unless the chains mix well, the inference will not be valid.

Maybe a model with time-varying transmission rates could be interesting. Eg consider a multilevel model where the transmission rate comes from common lognormal (with parameters estimated). Could this pick up the policy changes?

1 Like

You may be interested to look at the SIR-X model. See also the COVID-19 event horizon of Humboldt university Berlin.


It is indeed an interesting model (thanks for mentioning it), but assumes continuous parameter/policy changes. It is my impression that most countries implemented large, discrete changes.

In any case what I am most skeptical about is the measurement side of the models. Even assuming a sophisticated model for the transmission, symptoms, and recovery/death, only serious cases show up in the data with certainty. Testing for the rest of the population is very heterogeneous by country and time, and probably not modeled well with an IID approach (eg testing susceptible contacts).

1 Like

Very minor point, but I believe γ is not the case fatality rate as that is the fatality rate per time unit. So perhaps the case fatality rate is γ/β ?

I agree γ is really the fatality rate per time unit, but I don’t yet how the case “eventual fatality” rate is γ/β. I’d think that’s an odds ratio, and the actual rate is γ/(β+γ). But I might well be missing something.

Also, thanks for your note about changing population size!

1 Like

Yes, I think you’re right.

1 Like