[ANN]: Turing 0.13.0 - MLE/MAP and prediction

We’re happy to announce Turing 0.13.0, which brings a suite of much-needed features. I am particularly happy about this one because it is a feature-rich release, which is always fun.

Maximum likelihood and maximum a posteriori estimation

Turing’s model flexibility doesn’t necessarily restrict its use to only Bayesian methods – traditional maximum likelihood (MLE) or maximum a posteriori (MAP) estimation are supportable as well. If you work in a big-data style field, it’s now push-button easy to generate MAP or MLE estimates for your model, as long as there are no discrete parameters.

We recently added an interface to the world-class Optim.jl to handle this. Here is a brief example using the classic gdemo model:

using Turing
using Optim
using StatsBase

@model function gdemo()
s ~ InverseGamma(2,3)
m ~ Normal(0, sqrt(s))

1.5 ~ Normal(m, sqrt(s))
2.0 ~ Normal(m, sqrt(s))

return m, s
end

model = gdemo()

# Use the default optimizer, LBFGS.
mle1 = optimize(model, MLE())
mpa1 = optimize(model, MAP())

# Use a specific optimizer.
mle2 = optimize(model, MLE(), Optim.Newton())
mpa2 = optimize(model, MAP(), Optim.Newton())

We extended some of the StatsBase.jl methods, like coeftable, informationmatrix, stderror, vcov, etc. For example, we can examine whether our MAP estimates are statistically significant using standard errors calculated from the information matrix with coeftable:

julia> coeftable(mle1)
──────────────────────────────
estimate  stderror    tstat
──────────────────────────────
s    0.0625  0.0625    1.0
m    1.75    0.176777  9.89949
──────────────────────────────

or for the MAP estimate:

julia> coeftable(mpa1)
──────────────────────────────
estimate  stderror    tstat
──────────────────────────────
s  0.907407  0.427756  2.12132
m  1.16667   0.549972  2.12132
──────────────────────────────

You can also initialize sampling with the MAP or MLE estimates:

chain = sample(model, NUTS(), 1000, init_theta = mpa1.values.array)

We’re looking forward to seeing what people do with these new features. As always, this is new, so please report any issues you find.

Prediction

Historically, Turing has not natively supported posterior prediction, and many users had to roll their own code. No longer! Turing now implements the StatsBase.predict function that allows you to predict your data conditional on a previous Markov chain. Make sure you load StatsBase or call StatsBase.predict.

Here’s an example:

using Turing
using StatsBase

@model function linear_reg(x, y, σ = 0.1)
β ~ Normal(0, 1)

for i ∈ eachindex(y)
y[i] ~ Normal(β * x[i], σ)
end
end

f(x) = 2 * x + 0.1 * randn()

Δ = 0.1
xs_train = 0:Δ:10; ys_train = f.(xs_train);
xs_test = [10 + Δ, 10 + 2 * Δ]; ys_test = f.(xs_test);

# Infer
m_lin_reg = linear_reg(xs_train, ys_train);
chain_lin_reg = sample(m_lin_reg, NUTS(100, 0.65), 200);

# Predict on two last indices by making an uninitialized vector of length(ys_test)
# Make a new version of the model -- note that we put in our training data instead.
m_lin_reg_test = linear_reg(xs_test, Vector{Union{Missing, Float64}}(undef, length(ys_test)));

# Use the new predict function!
predictions = predict(m_lin_reg_test, chain_lin_reg)

# Get the mean predicted values.
ys_pred = collect(vec(mean(predictions[:y].value; dims = 1)))

# Get the prediction error:
ys_test - ys_pred

Forecast error:

2-element Array{Float64,1}:
0.057563331644999494
-0.07099319663700498

Prior sampling

Prior predictive checks are now much easier. You can sample directly from your prior with

chain = sample(model, Prior(), n_samples)

Not too exciting, but very necessary for good Bayesian analysis.

33 Likes

I thought I should also share some code I had handy that lets you analyze arbitrary ARMA(p,q) processes with trends and constants, which is an excellent use case for the MLE/MAP code.

Here’s the code:

using Optim
using Turing
using Statistics
using LinearAlgebra
using StatsBase

@model function ARMA(data, p, q, has_intercept::Bool=true, has_trend::Bool=true, ::Type{TV} = Vector{Float64}) where TV
# Variance parameter
σ ~ InverseGamma(2, 3)

# AR parameters
if p > 0
Φ ~ MvNormal(p, 1)
end

# MA parameters
if q > 0
θ ~ MvNormal(q, 1)
end

# Intercept
if has_intercept
intercept ~ Normal(0, 1)
end

# Trend parameter
if has_trend
trend ~ Normal(0, 1)
end

# Initialize mu
mu = TV(undef, length(data))

# Randomly initialize the first max(p, q) data points using the
# unconditional mean as a prior.
mu_bar = mean(data)
mu_std = std(data)
for t in 1:max(p,q)
mu[t] ~ Normal(mu_bar, mu_std)
end

# Generate mean values
for t in (max(p,q) + 1):length(data)
# If we have an AR component, calculate what it is.
ar_component = p > 0 ?
Φ'data[(t-p):(t-1)] :
0

# If we have an MA component, calculate what it is.
pred_error = data[(t-q):(t-1)] - mu[(t-q):(t-1)]
ma_component = q > 0 ?
θ'pred_error :
0

# Add the AR and MA component to the mean guess.
mu[t] = ar_component + ma_component

# If there is an intercept or a trend, add those to the mean estimate.
if has_intercept
mu[t] += intercept
end

if has_trend
mu[t] += t * trend
end
end

# Observe our data according to the calculated means.
data ~ MvNormal(mu, sqrt(σ))
end

# The true AR and MA parameters.
phi = 0.5
theta = 0.2

# Generate synthetic ARMA(1,1) data.
T = 500
ε = randn(T)
data = zeros(T) + ε

for t in 2:length(data)
data[t] += phi * data[t-1] + theta * ε[t-1]
end

# Put the data into our model.
model = ARMA(data, length(phi), length(theta))

# Get the MLE.
mle = optimize(model, MLE(), LBFGS(), Optim.Options(iterations = 1000, allow_f_increases = true))

# Analyze results.
coeftable(mle)

Results:

───────────────────────────────────────────────
estimate     stderror       tstat
───────────────────────────────────────────────
σ           0.970949    0.0614082    15.8114
Φ[1]        0.465351    0.0654885     7.10584
θ[1]        0.219487    0.073532      2.98492
intercept  -0.0263586   0.108075     -0.243891
trend      -1.73471e-5  0.000373044  -0.0465015
mu[1]      -0.307598    0.975998     -0.315163
───────────────────────────────────────────────

As we can see, the ML estimator does not find much evidence for an intercept or a trend, correctly so. To see why this is the case, you can read the t-stats as roughly equivalent to t-stats in OLS – when the absolute values are less than ~1.6, there is not much statistical significance (depending on convention).

It does however get fairly close to the true values of the AR and MA parameters, and will asymptotically converge as T goes to infinity.

Definitely a pleasure to work with Turing’s new functionality. Give it a shot.

13 Likes

If would be a lot easier to read if the output of codes can be shown directly in the document.

We don’t show the output in those samples because they have a different rendering flow and we don’t want the graphics/output to be out of date.

If you are looking for something with output, please see the tutorials.

3 Likes

FIrst of all, thanks for the amazing features, this is incredibly convenient!

However, running a simple linear regression I’m currently running into an error doing the prediction:

predict(mpred, chain)
ERROR: MethodError: Cannot `convert` an object of type Missing to an object of type Float64
Closest candidates are:
convert(::Type{T}, ::T) where T<:Number at number.jl:6
convert(::Type{T}, ::Number) where T<:Number at number.jl:7
convert(::Type{T}, ::Base.TwicePrecision) where T<:Number at twiceprecision.jl:250
...
Stacktrace:
[1] setproperty!(::Base.RefValue{Float64}, ::Symbol, ::Missing) at .\Base.jl:34
[2] setindex!(::Base.RefValue{Float64}, ::Missing) at .\refvalue.jl:33
[4] tilde_observe(::DynamicPPL.DefaultContext, ::DynamicPPL.SampleFromPrior, ::MvNormal{Float64,PDMats.ScalMat{Float64},Array{Float64,1}}, ::Array{Union{Missing, Float64},1}, ::DynamicPPL.VarName{:y,Tuple{}}, ::Tuple{}, ::DynamicPPL.VarInfo{DynamicPPL.Metadata{Dict{DynamicPPL.VarName,Int64},Array{Distribution,1},Array{DynamicPPL.VarName,1},Array{Real,1},Array{Set{DynamicPPL.Selector},1}},Float64}) at C:\Users\Jakob\.julia\packages\DynamicPPL\QgcLg\src\context_implementations.jl:90
[5] macro expansion at .\REPL[5]:6 [inlined]
[6] ##evaluator#271(::Random._GLOBAL_RNG, ::DynamicPPL.Model{var"###evaluator#271",(:x, :y),Tuple{Array{Float64,1},Array{Union{Missing, Float64},1}},(),DynamicPPL.ModelGen{var"###generator#272",(:x, :y),(),Tuple{}}}, ::DynamicPPL.VarInfo{DynamicPPL.Metadata{Dict{DynamicPPL.VarName,Int64},Array{Distribution,1},Array{DynamicPPL.VarName,1},Array{Real,1},Array{Set{DynamicPPL.Selector},1}},Float64}, ::DynamicPPL.SampleFromPrior, ::DynamicPPL.DefaultContext) at C:\Users\Jakob\.julia\packages\DynamicPPL\QgcLg\src\compiler.jl:356
[8] Model at C:\Users\Jakob\.julia\packages\DynamicPPL\QgcLg\src\model.jl:136 [inlined]
[9] Model at C:\Users\Jakob\.julia\packages\DynamicPPL\QgcLg\src\model.jl:126 [inlined]
[10] VarInfo at C:\Users\Jakob\.julia\packages\DynamicPPL\QgcLg\src\varinfo.jl:110 [inlined]
[11] VarInfo at C:\Users\Jakob\.julia\packages\DynamicPPL\QgcLg\src\varinfo.jl:109 [inlined]
[12] transitions_from_chain(::DynamicPPL.Model{var"###evaluator#271",(:x, :y),Tuple{Array{Float64,1},Array{Union{Missing, Float64},1}},(),DynamicPPL.ModelGen{var"###generator#272",(:x, :y),(),Tuple{}}}, ::Chains{Float64,Missing,NamedTuple{(:internals, :parameters),Tuple{Array{String,1},Array{String,1}}},NamedTuple{(),Tuple{}}}; sampler::DynamicPPL.SampleFromPrior) at C:\Users\Jakob\.julia\packages\Turing\GMBTf\src\inference\Inference.jl:746
[13] predict(::DynamicPPL.Model{var"###evaluator#271",(:x, :y),Tuple{Array{Float64,1},Array{Union{Missing, Float64},1}},(),DynamicPPL.ModelGen{var"###generator#272",(:x, :y),(),Tuple{}}}, ::Chains{Float64,Missing,NamedTuple{(:internals, :parameters),Tuple{Array{String,1},Array{String,1}}},NamedTuple{(),Tuple{}}}; include_all::Bool) at C:\Users\Jakob\.julia\packages\Turing\GMBTf\src\inference\Inference.jl:674
[14] predict(::DynamicPPL.Model{var"###evaluator#271",(:x, :y),Tuple{Array{Float64,1},Array{Union{Missing, Float64},1}},(),DynamicPPL.ModelGen{var"###generator#272",(:x, :y),(),Tuple{}}}, ::Chains{Float64,Missing,NamedTuple{(:internals, :parameters),Tuple{Array{String,1},Array{String,1}}},NamedTuple{(),Tuple{}}}) at C:\Users\Jakob\.julia\packages\Turing\GMBTf\src\inference\Inference.jl:671
[15] top-level scope at REPL[9]:1

Here’s the code to generate data and fit the model:

using Distributions, Turing, StatsBase

x = rand(Normal(0, 1), 100);
f(x) = 2x
y = f.(x) + rand(Normal(0, 1), 100);

@model linreg(x, y) = begin
α ~ Normal(0,1)
β ~ Normal(0,5)
σ ~ Exponential(2)
μ = α .+ x*β
y ~ MvNormal(μ, σ)
end;

m = linreg(x, y);
chain = sample(m, NUTS(100, .65), 1000)
mpred = linreg(x, Vector{Union{Missing, Float64}}(undef, length(y)));
predict(mpred, chain)

Not sure if I messed something up or if this is a bug.

Here’s my package info:

(@v1.4) pkg> status
Status `C:\Users\Jakob\.julia\environments\v1.4\Project.toml`
[7820620d] AgentsPlots v0.3.0
[c52e3926] Atom v0.12.10
[39de3d68] AxisArrays v0.4.3
[3895d2a7] CUDAapi v4.0.0
[5ae59095] Colors v0.12.1
[a93c6f00] DataFrames v0.21.0
[864edb3b] DataStructures v0.17.15
[5721bf48] DataVoyager v1.0.0
[31c24e10] Distributions v0.23.2
[634d3b9d] DrWatson v1.10.3
[587475ba] Flux v0.10.4
[a2cc645c] GraphPlot v0.3.1
[e5e0dc1b] Juno v0.8.1
[093fc24a] LightGraphs v1.3.1 #master (https://github.com/JuliaGraphs/LightGraphs.jl.git)
[c7f686f2] MCMCChains v3.0.12
[ee78f7c6] Makie v0.10.0
[eff96d63] Measurements v2.1.1
[0987c9cc] MonteCarloMeasurements v0.8.10
[5fb14364] OhMyREPL v0.5.5
[d96e819e] Parameters v0.12.1
[b98c9c47] Pipe v1.2.0
[91a5bcdd] Plots v1.2.3
[d330b81b] PyPlot v2.9.0
[ce6b1742] RDatasets v0.6.1
[295af30f] Revise v2.6.7
[2913bbd2] StatsBase v0.33.0
[65254759] StatsMakie v0.2.1
[f3b207a7] StatsPlots v0.14.6
[fce5fe82] Turing v0.13.0