# [ANN]: Turing 0.13.0 - MLE/MAP and prediction

We’re happy to announce Turing 0.13.0, which brings a suite of much-needed features. I am particularly happy about this one because it is a feature-rich release, which is always fun.

## Maximum likelihood and maximum a posteriori estimation

Turing’s model flexibility doesn’t necessarily restrict its use to only Bayesian methods – traditional maximum likelihood (MLE) or maximum a posteriori (MAP) estimation are supportable as well. If you work in a big-data style field, it’s now push-button easy to generate MAP or MLE estimates for your model, as long as there are no discrete parameters.

We recently added an interface to the world-class Optim.jl to handle this. Here is a brief example using the classic `gdemo` model:

``````using Turing
using Optim
using StatsBase

@model function gdemo()
s ~ InverseGamma(2,3)
m ~ Normal(0, sqrt(s))

1.5 ~ Normal(m, sqrt(s))
2.0 ~ Normal(m, sqrt(s))

return m, s
end

model = gdemo()

# Use the default optimizer, LBFGS.
mle1 = optimize(model, MLE())
mpa1 = optimize(model, MAP())

# Use a specific optimizer.
mle2 = optimize(model, MLE(), Optim.Newton())
mpa2 = optimize(model, MAP(), Optim.Newton())
``````

We extended some of the StatsBase.jl methods, like `coeftable`, `informationmatrix`, `stderror`, `vcov`, etc. For example, we can examine whether our MAP estimates are statistically significant using standard errors calculated from the information matrix with `coeftable`:

``````julia> coeftable(mle1)
──────────────────────────────
estimate  stderror    tstat
──────────────────────────────
s    0.0625  0.0625    1.0
m    1.75    0.176777  9.89949
──────────────────────────────
``````

or for the MAP estimate:

``````julia> coeftable(mpa1)
──────────────────────────────
estimate  stderror    tstat
──────────────────────────────
s  0.907407  0.427756  2.12132
m  1.16667   0.549972  2.12132
──────────────────────────────
``````

You can also initialize sampling with the MAP or MLE estimates:

``````chain = sample(model, NUTS(), 1000, init_theta = mpa1.values.array)
``````

We’re looking forward to seeing what people do with these new features. As always, this is new, so please report any issues you find.

## Prediction

Historically, Turing has not natively supported posterior prediction, and many users had to roll their own code. No longer! Turing now implements the `StatsBase.predict` function that allows you to predict your data conditional on a previous Markov chain. Make sure you load `StatsBase` or call `StatsBase.predict`.

Here’s an example:

``````using Turing
using StatsBase

@model function linear_reg(x, y, σ = 0.1)
β ~ Normal(0, 1)

for i ∈ eachindex(y)
y[i] ~ Normal(β * x[i], σ)
end
end

f(x) = 2 * x + 0.1 * randn()

Δ = 0.1
xs_train = 0:Δ:10; ys_train = f.(xs_train);
xs_test = [10 + Δ, 10 + 2 * Δ]; ys_test = f.(xs_test);

# Infer
m_lin_reg = linear_reg(xs_train, ys_train);
chain_lin_reg = sample(m_lin_reg, NUTS(100, 0.65), 200);

# Predict on two last indices by making an uninitialized vector of length(ys_test)
# Make a new version of the model -- note that we put in our training data instead.
m_lin_reg_test = linear_reg(xs_test, Vector{Union{Missing, Float64}}(undef, length(ys_test)));

# Use the new predict function!
predictions = predict(m_lin_reg_test, chain_lin_reg)

# Get the mean predicted values.
ys_pred = collect(vec(mean(predictions[:y].value; dims = 1)))

# Get the prediction error:
ys_test - ys_pred
``````

Forecast error:

``````2-element Array{Float64,1}:
0.057563331644999494
-0.07099319663700498
``````

## Prior sampling

Prior predictive checks are now much easier. You can sample directly from your prior with

``````chain = sample(model, Prior(), n_samples)
``````

Not too exciting, but very necessary for good Bayesian analysis.

33 Likes

I thought I should also share some code I had handy that lets you analyze arbitrary `ARMA(p,q)` processes with trends and constants, which is an excellent use case for the MLE/MAP code.

Here’s the code:

``````using Optim
using Turing
using Statistics
using LinearAlgebra
using StatsBase

@model function ARMA(data, p, q, has_intercept::Bool=true, has_trend::Bool=true, ::Type{TV} = Vector{Float64}) where TV
# Variance parameter
σ ~ InverseGamma(2, 3)

# AR parameters
if p > 0
Φ ~ MvNormal(p, 1)
end

# MA parameters
if q > 0
θ ~ MvNormal(q, 1)
end

# Intercept
if has_intercept
intercept ~ Normal(0, 1)
end

# Trend parameter
if has_trend
trend ~ Normal(0, 1)
end

# Initialize mu
mu = TV(undef, length(data))

# Randomly initialize the first max(p, q) data points using the
# unconditional mean as a prior.
mu_bar = mean(data)
mu_std = std(data)
for t in 1:max(p,q)
mu[t] ~ Normal(mu_bar, mu_std)
end

# Generate mean values
for t in (max(p,q) + 1):length(data)
# If we have an AR component, calculate what it is.
ar_component = p > 0 ?
Φ'data[(t-p):(t-1)] :
0

# If we have an MA component, calculate what it is.
pred_error = data[(t-q):(t-1)] - mu[(t-q):(t-1)]
ma_component = q > 0 ?
θ'pred_error :
0

# Add the AR and MA component to the mean guess.
mu[t] = ar_component + ma_component

# If there is an intercept or a trend, add those to the mean estimate.
if has_intercept
mu[t] += intercept
end

if has_trend
mu[t] += t * trend
end
end

# Observe our data according to the calculated means.
data ~ MvNormal(mu, sqrt(σ))
end

# The true AR and MA parameters.
phi = 0.5
theta = 0.2

# Generate synthetic ARMA(1,1) data.
T = 500
ε = randn(T)
data = zeros(T) + ε

for t in 2:length(data)
data[t] += phi * data[t-1] + theta * ε[t-1]
end

# Put the data into our model.
model = ARMA(data, length(phi), length(theta))

# Get the MLE.
mle = optimize(model, MLE(), LBFGS(), Optim.Options(iterations = 1000, allow_f_increases = true))

# Analyze results.
coeftable(mle)
``````

Results:

``````───────────────────────────────────────────────
estimate     stderror       tstat
───────────────────────────────────────────────
σ           0.970949    0.0614082    15.8114
Φ        0.465351    0.0654885     7.10584
θ        0.219487    0.073532      2.98492
intercept  -0.0263586   0.108075     -0.243891
trend      -1.73471e-5  0.000373044  -0.0465015
mu      -0.307598    0.975998     -0.315163
───────────────────────────────────────────────
``````

As we can see, the ML estimator does not find much evidence for an intercept or a trend, correctly so. To see why this is the case, you can read the t-stats as roughly equivalent to t-stats in OLS – when the absolute values are less than `~1.6`, there is not much statistical significance (depending on convention).

It does however get fairly close to the true values of the AR and MA parameters, and will asymptotically converge as `T` goes to infinity.

Definitely a pleasure to work with Turing’s new functionality. Give it a shot.

13 Likes

If would be a lot easier to read if the output of codes can be shown directly in the document.

We don’t show the output in those samples because they have a different rendering flow and we don’t want the graphics/output to be out of date.

If you are looking for something with output, please see the tutorials.

3 Likes

FIrst of all, thanks for the amazing features, this is incredibly convenient!

However, running a simple linear regression I’m currently running into an error doing the prediction:

``````predict(mpred, chain)
ERROR: MethodError: Cannot `convert` an object of type Missing to an object of type Float64
Closest candidates are:
convert(::Type{T}, ::T) where T<:Number at number.jl:6
convert(::Type{T}, ::Number) where T<:Number at number.jl:7
convert(::Type{T}, ::Base.TwicePrecision) where T<:Number at twiceprecision.jl:250
...
Stacktrace:
 setproperty!(::Base.RefValue{Float64}, ::Symbol, ::Missing) at .\Base.jl:34
 setindex!(::Base.RefValue{Float64}, ::Missing) at .\refvalue.jl:33
 tilde_observe(::DynamicPPL.DefaultContext, ::DynamicPPL.SampleFromPrior, ::MvNormal{Float64,PDMats.ScalMat{Float64},Array{Float64,1}}, ::Array{Union{Missing, Float64},1}, ::DynamicPPL.VarName{:y,Tuple{}}, ::Tuple{}, ::DynamicPPL.VarInfo{DynamicPPL.Metadata{Dict{DynamicPPL.VarName,Int64},Array{Distribution,1},Array{DynamicPPL.VarName,1},Array{Real,1},Array{Set{DynamicPPL.Selector},1}},Float64}) at C:\Users\Jakob\.julia\packages\DynamicPPL\QgcLg\src\context_implementations.jl:90
 macro expansion at .\REPL:6 [inlined]
 ##evaluator#271(::Random._GLOBAL_RNG, ::DynamicPPL.Model{var"###evaluator#271",(:x, :y),Tuple{Array{Float64,1},Array{Union{Missing, Float64},1}},(),DynamicPPL.ModelGen{var"###generator#272",(:x, :y),(),Tuple{}}}, ::DynamicPPL.VarInfo{DynamicPPL.Metadata{Dict{DynamicPPL.VarName,Int64},Array{Distribution,1},Array{DynamicPPL.VarName,1},Array{Real,1},Array{Set{DynamicPPL.Selector},1}},Float64}, ::DynamicPPL.SampleFromPrior, ::DynamicPPL.DefaultContext) at C:\Users\Jakob\.julia\packages\DynamicPPL\QgcLg\src\compiler.jl:356
 Model at C:\Users\Jakob\.julia\packages\DynamicPPL\QgcLg\src\model.jl:136 [inlined]
 Model at C:\Users\Jakob\.julia\packages\DynamicPPL\QgcLg\src\model.jl:126 [inlined]
 VarInfo at C:\Users\Jakob\.julia\packages\DynamicPPL\QgcLg\src\varinfo.jl:110 [inlined]
 VarInfo at C:\Users\Jakob\.julia\packages\DynamicPPL\QgcLg\src\varinfo.jl:109 [inlined]
 transitions_from_chain(::DynamicPPL.Model{var"###evaluator#271",(:x, :y),Tuple{Array{Float64,1},Array{Union{Missing, Float64},1}},(),DynamicPPL.ModelGen{var"###generator#272",(:x, :y),(),Tuple{}}}, ::Chains{Float64,Missing,NamedTuple{(:internals, :parameters),Tuple{Array{String,1},Array{String,1}}},NamedTuple{(),Tuple{}}}; sampler::DynamicPPL.SampleFromPrior) at C:\Users\Jakob\.julia\packages\Turing\GMBTf\src\inference\Inference.jl:746
 predict(::DynamicPPL.Model{var"###evaluator#271",(:x, :y),Tuple{Array{Float64,1},Array{Union{Missing, Float64},1}},(),DynamicPPL.ModelGen{var"###generator#272",(:x, :y),(),Tuple{}}}, ::Chains{Float64,Missing,NamedTuple{(:internals, :parameters),Tuple{Array{String,1},Array{String,1}}},NamedTuple{(),Tuple{}}}; include_all::Bool) at C:\Users\Jakob\.julia\packages\Turing\GMBTf\src\inference\Inference.jl:674
 predict(::DynamicPPL.Model{var"###evaluator#271",(:x, :y),Tuple{Array{Float64,1},Array{Union{Missing, Float64},1}},(),DynamicPPL.ModelGen{var"###generator#272",(:x, :y),(),Tuple{}}}, ::Chains{Float64,Missing,NamedTuple{(:internals, :parameters),Tuple{Array{String,1},Array{String,1}}},NamedTuple{(),Tuple{}}}) at C:\Users\Jakob\.julia\packages\Turing\GMBTf\src\inference\Inference.jl:671
 top-level scope at REPL:1
``````

Here’s the code to generate data and fit the model:

``````using Distributions, Turing, StatsBase

x = rand(Normal(0, 1), 100);
f(x) = 2x
y = f.(x) + rand(Normal(0, 1), 100);

@model linreg(x, y) = begin
α ~ Normal(0,1)
β ~ Normal(0,5)
σ ~ Exponential(2)
μ = α .+ x*β
y ~ MvNormal(μ, σ)
end;

m = linreg(x, y);
chain = sample(m, NUTS(100, .65), 1000)
mpred = linreg(x, Vector{Union{Missing, Float64}}(undef, length(y)));
predict(mpred, chain)
``````

Not sure if I messed something up or if this is a bug.

Here’s my package info:

``````(@v1.4) pkg> status
Status `C:\Users\Jakob\.julia\environments\v1.4\Project.toml`
[7820620d] AgentsPlots v0.3.0
[c52e3926] Atom v0.12.10
[39de3d68] AxisArrays v0.4.3
[3895d2a7] CUDAapi v4.0.0
[5ae59095] Colors v0.12.1
[a93c6f00] DataFrames v0.21.0
[864edb3b] DataStructures v0.17.15
[5721bf48] DataVoyager v1.0.0
[31c24e10] Distributions v0.23.2
[634d3b9d] DrWatson v1.10.3
[587475ba] Flux v0.10.4
[a2cc645c] GraphPlot v0.3.1
[e5e0dc1b] Juno v0.8.1
[093fc24a] LightGraphs v1.3.1 #master (https://github.com/JuliaGraphs/LightGraphs.jl.git)
[c7f686f2] MCMCChains v3.0.12
[ee78f7c6] Makie v0.10.0
[eff96d63] Measurements v2.1.1
[0987c9cc] MonteCarloMeasurements v0.8.10
[5fb14364] OhMyREPL v0.5.5
[d96e819e] Parameters v0.12.1
[b98c9c47] Pipe v1.2.0
[91a5bcdd] Plots v1.2.3
[d330b81b] PyPlot v2.9.0
[ce6b1742] RDatasets v0.6.1
[295af30f] Revise v2.6.7
[2913bbd2] StatsBase v0.33.0
 StatsMakie v0.2.1
[f3b207a7] StatsPlots v0.14.6
[fce5fe82] Turing v0.13.0
``````

Thanks for the great work! I just updated a blog post I made with Turing to the latest release… `MCMCThreads()` and `Prior()` sampling as well. Exciting times for Turing… it’s really impressive.