Hello all.
I am trying to use Turing to run a state space model based on Simon Jackman’s Bayesian Analysis for the Social Sciences. In chapter 9, he sets out a few different latent variable models, including a state space model for pooling for the polls. Basically, there is an underlying “true” voting intention that we are trying to estimate from a bunch of noisy polls. By forcing the model start and end with actual election results, we can estimate the house effects, which may or may not sum to zero. There a good case study of this model done using Stan by Peter Ellis.
I’ve run this model using Stan, and it takes about 7 minutes on my laptop. However, when I run it using Turing, it takes about 3 hours using NUTS. I suspect the reason has to do with using a for loop. Other samplers didn’t really give me an improvement in performance.
If anyone has any ideas on how to seed up the model , please let me know.
Code:
using Plots, StatsPlots
using DataFrames, CSV
using Turing, ReverseDiff, Memoization
using RDatasets
using StatsFuns
using Dates
# Helper function
function calc_moe(x, ss)
return sqrt(x * (1-x) / ss)
end
# load and clean data
df = RDatasets.dataset("pscl", "AustralianElectionPolling")
dateformat = DateFormat("y-m-d")
Election2004 = Date("2004-10-09")
Election2007 = Date("2007-11-24")
N_days = Dates.value(Election2007 - Election2004) + 1
df.NumDays = df.EndDate .- Election2004
df.NumDays = Dates.value.(df.NumDays) .+ 1
df.ALPProp = df.ALP ./ 100
df.Poll_ID = [1:size(df, 1);]
pollster_dict = Dict(key => idx for (idx, key) in enumerate(unique(df.Org)))
df.pollster_id = [pollster_dict[i] for i in df.Org]
# Plot polls
scatter((df.NumDays, df.ALPProp), group = df.Org)
# Define model
@model function state_space(y, y_moe, start_election, end_election, poll_date, poll_id, N_days, N_pollsters, pollster_id, ::Type{T} = Float64) where {T}
# priors
ξ = Vector{T}(undef, N_days)
z_ξ ~ filldist(Normal(0, 1), (N_days-2))
δ ~ filldist(Normal(0, 0.05), N_pollsters)
σ ~ Exponential(1/5)
# daily random walk
ξ[1] = start_election
ξ[N_days] = end_election
for i in 2:(N_days - 1)
ξ[i] = ξ[i-1] + σ * z_ξ[i-1]
end
# Measurement model
y .~ Normal.(ξ[poll_date[poll_id]] .+ δ[pollster_id], y_moe[poll_id])
return ξ
end
N_polls = size(df, 1)
N_pollsters = length(unique(df.pollster_id))
y_moe = calc_moe.(df.ALPProp, df.SampleSize)
y = df.ALPProp
start_election = .3764
end_election = .4338
poll_date = df.NumDays
poll_id = df.Poll_ID
pollster_id = df.pollster_id
n_adapt = 1_000
n_iter = 2_000
n_chains = 4
Turing.setadbackend(:reversediff) # Zygote and Tracker do not work
model = state_space(y, y_moe, start_election, end_election, poll_date, poll_id, N_days, N_pollsters, pollster_id)
chn = sample(model, NUTS(n_adapt, .8), MCMCThreads(), n_iter, n_threads)
Thanks!
EDIT: Actually used an Exponential(1/5) distribution, as in julia it is the inverse of Stan.