How to make a model applicable to Turing.jl?

RPS · May 5, 2022, 7:35pm

I’ve implemented a model, for which I would like to use Turing.jl to evaluate the posterior distribution of the model parameters. The problem I’m facing is that the way I’m storing the model parameters (around 20-40, depending on the model input) does not allow the direct application of Turing.jl, as this package requires my model to accept variables of type ForwardDiff.Dual (if my interpretation of the error message is correct). Currently, I’m storing the parameters whithin a dictionary and a DataFrame object that is contained in this dictionary.

My question is: What data structure and types should I use instead to allow for the usage of Turing.jl?

nilshg · May 5, 2022, 8:01pm

No problems storing duals in either Dicts or DataFrames:

julia> using DataFrames, ForwardDiff

julia> DataFrame(x = [ForwardDiff.Dual(0, 1)], y = [Dict(1 => ForwardDiff.Dual(1, 2))])
1×2 DataFrame
 Row │ x                   y                                 
     │ Dual…               Dict…                             
─────┼───────────────────────────────────────────────────────
   1 │ Dual{Nothing}(0,1)  Dict{Int64, ForwardDiff.Dual{Not…

so you’ll have to give us an MWE to help.

RPS · May 6, 2022, 1:04pm

Thanks for your quick answer nilshg. This is good to know!

I tried to make a MWE of my problem:

using Distributions
using FillArrays
using StatsPlots
using LinearAlgebra
using Random
using Parameters
using Turing
using MCMCChains
using Roots
using DataFrames
using LabelledArrays

# Set a random seed to ensure reproducibility of the results
Random.seed!(1)

function toy_function(y, p)
    (x,params) = p
    return x.^2 .* params.value[1] .+ y .* params.value[2] .+ params.value[3]
end

function toy_model(x, params)
    sol = zeros(eltype(params.value), length(x))
    for (idx, x_value) in enumerate(x)
        zero_problem = ZeroProblem(toy_function, zero(eltype(params.value)))
        sol[idx] = solve(zero_problem, Order2(), p=(x_value,params), maxevals=1000)
    end
    return sol
end

# Define prior distributions
param_prior_distributions = DataFrame(name=["a","b","c"],
                                      distr=[Normal(3,10), Normal(1,2), Normal(3,4)])
true_parameter_values = DataFrame(name=["a","b","c"],
                                  value=mean.(param_prior_distributions.distr))

# Define the measurement model
normalmodel = Normal(0, 1)

# Evaluate synthetic measurement data
Nx = 100
x_values = collect(range(-2, 2, length=Nx))
y_values = toy_model(x_values, true_parameter_values) + rand(normalmodel, Nx)

# Plot measurement data
scatter(x_values, y_values; legend=false, title="Synthetic Dataset")

@model function turing_toy_model(x, y, param_prior_distributions)

    params = DataFrame(name=["a","b","c"], value=[0.0, 0.0, 0.0])
    for (idx, _) in enumerate(param_prior_distributions.distr)
        params.value[idx] ~ param_prior_distributions.distr[idx]
    end

    mu = toy_model(x, params)
    N = length(x)
    for i in 1:N
        y[i] ~ Normal(mu[i], std(normalmodel))
    end
    return y

end

my_turing_toy_model = turing_toy_model(x_values, y_values, param_prior_distributions);

# perform sampling
nsamples = 100
chain = sample(my_turing_toy_model, NUTS(0.65), nsamples);

describe(chain)
plot(chain)

a_posteriori_mean_values = describe(chain)[1][:,:mean];
a_posteriori_std_values = describe(chain)[1][:,:std];

println("true parameter values: ", true_parameter_values)
println("a_posteriori_mean_values:", a_posteriori_mean_values)
println("a_posteriori_std_values:", a_posteriori_std_values)

y_values_exact = toy_model(x_values, true_parameter_values)
y_values_prior = toy_model(x_values, model_params)
df_a_posteriori_mean_values =DataFrame(name=["a","b","c"], value=a_posteriori_mean_values)
y_values_posterior = toy_model(x_values, df_a_posteriori_mean_values)

plot(x_values, y_values_exact; legend=:bottomright, title="Model Predictions", label=["Exact values"])
scatter!(x_values, y_values_prior, label=["Prior"])
scatter!(x_values, y_values_posterior, label=["Posterior"])

Unfortunately, the above example has another issue: When trying to use a dataframe inside the @model function I get the error:

ArgumentError: column name :columns not found in the data frame

Topic		Replies	Views
Sampling from Turing model combining discrete and continuous variables fails on second loop through the model General Usage question , type , error , turing	0	314	May 29, 2022
Turing with multiple replicates of data New to Julia turing	0	294	March 20, 2023
Turing TypeError: in typeassert, expected Float64, got a value of type ForwardDiff.Dual{Nothing, Float64, 5} New to Julia question , ode , turing	2	1417	July 3, 2021
Turing type error: expected Float64, got ForwardDiff.Dual Probabilistic Programming question , type , turing	9	1247	October 3, 2020
Network ODE Model; Type error with ForwardDiff Dual Probabilistic Programming turing	9	759	February 3, 2021

How to make a model applicable to Turing.jl?

Related topics