# How to specify a Turing model to have the 2 vectors as input for a distribution that returns a tuple

The SequentialSamplingModels.jl package implements distributions that are used in psychology research. One such model, the Linear Ballistic Accumulator (LBA), is used to model simultaneously reaction time (RT) and choice (a binary variable).

The `LBA()` distribution returns a tuple of the two vectors:

``````using Turing
using SequentialSamplingModels
using Random
using LinearAlgebra
using DataFrames

rand(LBA(ν=[3.0, 2.0], A=0.8, k=0.2, τ=0.3), 3)
``````
``````(choice = [1, 1, 1], rt = [0.4951585198314631, 0.4486594724069529, 0.5222694726662244])
``````

One can use such distribution in Turing in a simple model as follows:

``````data = DataFrame(rand(LBA(ν=[3.0, 2.0], A=0.8, k=0.2, τ=0.3), 500))

@model function model_lba(data)
data = (choice=data.choice, rt=data.rt)
min_rt = minimum(data)

# Priors
drift ~ filldist(MvNormal(zeros(2), I * 2), 2)
A ~ truncated(Normal(0.8, 0.4), 0.0, Inf)
k ~ truncated(Normal(0.2, 0.2), 0.0, Inf)
tau ~ Uniform(0.0, min_rt)

# Likelihood
data ~ LBA(; τ=tau, A=A, k=k, ν=drift)
return (; data, drift, A, k, tau)
end

chain = sample(model_lba(data), Prior(), 50000)
``````

I would like to re-write the model to accept two independent vectors as inputs.

My initial input was to make a simple change and capture the `LBA()` output as a tuple:

``````@model function model_lba(choice, rt)
min_rt = minimum(rt)

# Priors
drift ~ filldist(MvNormal(zeros(2), I * 2), 2)
A ~ truncated(Normal(0.8, 0.4), 0.0, Inf)
k ~ truncated(Normal(0.2, 0.2), 0.0, Inf)
tau ~ Uniform(0.0, min_rt)

# Likelihood
(choice, rt) ~ LBA(; τ=tau, A=A, k=k, ν=drift)
return (; choice, rt, drift, A, k, tau)
end
``````
``````ERROR: LoadError: Malformed variable name (choice, rt)!
``````

But that seems incompatible with Turing (?). I then tried a workaround by creating the `data` tuple inside the model:

``````@model function model_lba(choice, rt)
data = (choice=choice, rt=rt)
min_rt = minimum(rt)

# Priors
drift ~ filldist(MvNormal(zeros(2), I * 2), 2)
A ~ truncated(Normal(0.8, 0.4), 0.0, Inf)
k ~ truncated(Normal(0.2, 0.2), 0.0, Inf)
tau ~ Uniform(0.0, min_rt)

# Likelihood
data ~ LBA(; τ=tau, A=A, k=k, ν=drift)
return (; data.choice, data.rt, drift, A, k, tau)
end

chain = sample(model_lba(data.choice, data.rt), Prior(), 50000)
``````
``````ERROR: MethodError: no method matching iterate(::LBA{Matrix{Float64}, Float64, Float64, Float64})
...
``````

But to no avail. Any ideas or recommendations are welcome!

(note: this question was originally asked on the package’s repo in the context of using `predict()` from the priors, but the author suggested to ask the wider Turing community)

IMO this is an issue with SequentialSamplingModels.jl SequentialSamplingModels.jl sort of “lies” about what the distribution actually implements, The below indicates that it’s a `UnivariateContinuousDistribution`, which, according to Distributions.jl’s design, should imply that `rand(dist)` results in a `Real`; not a `NamedTuple`.

This is why Turing.jl doesn’t know what to do with it. It just happens to work out when `data` is an observation because in that scenario, Turing.jl really only needs to be able to compute the `logpdf`.

If you really want to work around this, you can do something like:

``````# This is using internals of Turing.jl so we don't guarantee that this will continue to work between releases.
macro isobservation(variable)
tmp = gensym(:isobservation)
return esc(:(\$tmp = \$(DynamicPPL.isassumption(variable)); !\$tmp))
end

# Alternative constructor because we need `data` to occur in
# the arguments of the model to be considered an observation.
model_lba(choice, rt) = model_lba((choice=choice, rt=rt))
model_lba(data::DataFrame) = model_lba((choice=data.choice, rt=data.rt))
@model function model_lba(data)
choice, rt = data.choice, data.rt
min_rt = minimum(rt)

# Priors
drift ~ filldist(MvNormal(zeros(2), I * 2), 2)
A ~ truncated(Normal(0.8, 0.4), 0.0, Inf)
k ~ truncated(Normal(0.2, 0.2), 0.0, Inf)
tau ~ Uniform(0.0, min_rt)

# Likelihood
data_dist = LBA(; τ=tau, A=A, k=k, ν=drift)
if @isobservation(data)
else
# If `data` is not considered an observation, we sample.
data = rand(data_dist)
end
return (; data.choice, data.rt, drift, A, k, tau)
end
``````

Note that the above has the drawback of not being reproducible since we’re calling `rand` without providing the `rng` that is being used; I can provide functionality for doing this though, if desired.

EDIT: Also, this won’t be compatible with the construction of `Chains` because Turing won’t record the values when you call `rand` instead of using `~` So if you want to sample from the prior, you’ll need to manually call `model()` multiple times instead. For example

``````model_lba(data)()
``````

and then extract the desired variables from there.

Interesting, thanks a lot. Indeed the author of SequentialSamplingModels.jl mentioned this issue.

Do you think it would be worth it to ask the Distributions.jl community for possible alternative classes more suitable for these models?

Do you think it would be worth it to ask the Distributions.jl community for possible alternative classes more suitable for these models?

Distributions.jl do support some variate-types that are not just simple arrays or reals, but my prior tells me that the probability of them adding support for `NamedTuple` in the near future is low @torfjelde, thank you for your insights. You are right that the type is misspecified. Unfortunately, I have not figured out a way support mixed Multivariate types: the choice is discrete whereas the reaction time is continuous. Is there a way to support something like `MixedMultivariateDistribution`?

Alternatively, would a `Tuple` be easier to support than a `NamedTuple`? The other option is to return `Vector{Union{Int,Float64}}`, but I’m not sure whether that would cause performance issues.

No, that’s the same.

The other option is to return `Vector{Union{Int,Float64}}` , but I’m not sure whether that would cause performance issues.

This would make it work with Turing.jl yes, but as you say, not sure if this is the way to go.

Honestly, the most straight-forward approach would probably be to just implement the distribution as a Turing `@model`, and then you can just use it within another model using `@submodel`.

Is there any documentation at all on how submodel works? I feel like Turing needs an update to its documentation given all of the things that have been going on behind the scenes and actually I would love to participate in building out that documentation but unfortunately I don’t know enough about what’s changed to do the work. However maybe a thread here in the forum on things that need to get changed with some quick discussions about how they work could induce some people who are adept at writing documentation to take a stab at that I know I certainly would.

2 Likes

I modified the example above and was able to get `sample` to work without the missing method error. However, `predict` did not work, which was presented here in some form. It just uses the same data that was input into the model. But I am not sure how to apply that solution to the present example. I tried different data containers such as matrix and Tuple, but the result was the same. Would you be able to look at the most recent version of the code if you have a moment please?

``````using Turing
using SequentialSamplingModels
using Random
using LinearAlgebra

# Generate data with different drifts for two conditions A vs. B
Random.seed!(254)
data = rand(LBA(ν=[3.0, 2.0], A=0.8, k=0.2, τ=0.3), 10)

@model function model_lba(data)
min_rt = minimum(data)

# Priors
drift ~ filldist(MvNormal(zeros(2), I * 2), 2)
A ~ truncated(Normal(0.8, 0.4), 0.0, Inf)
k ~ truncated(Normal(0.2, 0.2), 0.0, Inf)
tau ~ Uniform(0.0, min_rt)

# Likelihood
data ~ LBA(; τ=tau, A=A, k=k, ν=drift)
println("data \$data")
return (; data..., drift, A, k, tau)
end
# this does work now
chain = sample(model_lba(data), Prior(), 100)
# this does not work, but if you rerun predict, it does generate data, but its the same as the input
predictions = predict(model_lba(data), chain)
``````

versions

``````SequentialSamplingModels v0.3.3
Turing v0.26.0
``````

Yes, I was able to also ask the same question as dlakelan.

That would be awesome! And yes, there has been a lot of stuff going on behind the scenes over the past year, so we’re in dire need of updating the docs. One improvement on the docside is that at least now all the separate packages has it’s own documentation, e.g. Home · DynamicPPL, meaning that documentation for these things at least exists somewhere. For example, if you just look at the docstring for `@submodel`, it should be fairly well-explained. But we really need to a) update the tutorials to use new best practices, and b) show off some of the new features.

How about I make an issue on Turing.jl on “new stuff” and how we need docs for it? Then we can discuss there:) I’ll try to be as suppportive as possible throughout too.

4 Likes

But here too the samples for `data` does not end up in the resulting `chain`.

Gotcha. I was not aware that the data was supposed to be in the chain too. Hmmm. I wonder why it didn’t work with the other data containers.

In any case, could you please provide some guidance on the submodel approach or point us to a resource? Thanks!

We can discuss the doc-issue further there 1 Like

`?@submodel` should be a good starting point. But if you can give me, say, the `LBA` model implemented as a `@model`, I can replicate the above example using `@submode` as a demo Thanks. I looked through the doc strings, but still have some questions about what it means to implement a model as @model. However, before I get too far into the details, I wanted to ask you about the code below. I adapted Dominique’s example for a Wald model (basically a reparameterized InverseGaussian), which is actually a subtype of `UnivariateContinuousDistribution`:

``````julia> dist = Wald(ν = 2.0, α=1.0, θ=.3); rand(dist)
0.5196099061252006
``````

If that was the problem (or sole problem), I would expect the code below to work. For some reason it does not. I want to bring that up to make sure we are addressing the correct issue. In addition, the same thing happens if I use `InverseGaussian` directly instead of `Wald`.

Wald

``````using Turing
using SequentialSamplingModels
using Random
using LinearAlgebra

# Generate data with different drifts for two conditions A vs. B
Random.seed!(254)
rts = rand(Wald(ν=2.0, α=0.8, θ=0.3), 10)

@model function model_wald(rts)
min_rt = minimum(rts)

ν ~ truncated(Normal(2, 1), 0.0, Inf)
α ~ truncated(Normal(0.8, 0.4), 0.0, Inf)
θ ~ Uniform(0.0, min_rt)

# Likelihood
rts ~ Wald(; ν, α, θ)
return (; rts, ν, α, θ)
end

chain = sample(model_wald(rts), Prior(), 100)

predictions = predict(model_wald(rts), chain)
``````

Inverse Gaussian

``````using Turing
using Distributions
using Random
using LinearAlgebra

# Generate data with different drifts for two conditions A vs. B
Random.seed!(254)
rts = rand(InverseGaussian(1.0, 0.8,), 10)

@model function model_IG(rts)

ν ~ truncated(Normal(2, 1), 0.0, Inf)
α ~ truncated(Normal(0.8, 0.4), 0.0, Inf)

# Likelihood
rts ~ InverseGaussian(ν, α)
return (; rts, ν, α)
end

chain = sample(model_IG(rts), Prior(), 100)

predictions = predict(model_IG(rts), chain)
``````

What exactly do you mean you say"it doesn’t work"? It shouldn’t be the same error as we saw before. If you mean that it won’t sample `rts` when you call `predict`, then this is (possibly unfortunately) how `predict` is intended to work (see predict() from a Prior-sampled chain returns no predictions · Issue #2012 · TuringLang/Turing.jl · GitHub for an explanation).

In short, if you now want to sample `rts`, you need to do:

``````predictions = predict(model_IG(missing), chain)
``````

This won’t work in our case since you depend on `minimum(rts)` in your model, so the full example would be

``````using Turing
using SequentialSamplingModels
using Random
using LinearAlgebra

# Generate data with different drifts for two conditions A vs. B
Random.seed!(254)
rts = rand(Wald(ν=2.0, α=0.8, θ=0.3), 10)

@model function model_wald(rts; min_rt=minimum(rts))
ν ~ truncated(Normal(2, 1), 0.0, Inf)
α ~ truncated(Normal(0.8, 0.4), 0.0, Inf)
θ ~ Uniform(0.0, min_rt)

# Likelihood
rts ~ Wald(; ν, α, θ)
return (; rts, ν, α, θ)
end

chain = sample(model_wald(rts), Prior(), 100)

predictions = predict(model_wald(missing; min_rt=minimum(rts)), chain)
``````

Though running this code seems to fail because `Wald` doesn’t have `rand` implemented properly:

``````julia> dist
Wald
┌───────────┬───────┐
│ Parameter │ Value │
├───────────┼───────┤
│ ν         │  1.36 │
│ α         │  0.74 │
│ θ         │  0.10 │
└───────────┴───────┘

julia> rand(dist)
0.39546979136012583

julia> # Passing `rng` is not supported!
rand(Random.default_rng(), dist)
ERROR: MethodError: no method matching iterate(::Wald{Float64, Float64, Float64})

Closest candidates are:
iterate(::Union{LinRange, StepRangeLen})
@ Base range.jl:880
iterate(::Union{LinRange, StepRangeLen}, ::Integer)
@ Base range.jl:880
iterate(::T) where T<:Union{Base.KeySet{<:Any, <:Dict}, Base.ValueIterator{<:Dict}}
@ Base dict.jl:698
...

Stacktrace:
 copyto!(dest::Vector{Float64}, src::Wald{Float64, Float64, Float64})
@ Base ./abstractarray.jl:946
 _collect(cont::UnitRange{Int64}, itr::Wald{Float64, Float64, Float64}, #unused#::Base.HasEltype, isz::Base.HasLength)
@ Base ./array.jl:713
 collect(itr::Wald{Float64, Float64, Float64})
@ Base ./array.jl:707
 quantile(itr::Wald{Float64, Float64, Float64}, p::Float64; sorted::Bool, alpha::Float64, beta::Float64)
@ Statistics ~/.julia/juliaup/julia-1.9.0+0.x64.linux.gnu/share/julia/stdlib/v1.9/Statistics/src/Statistics.jl:1070
 quantile(itr::Wald{Float64, Float64, Float64}, p::Float64)
@ Statistics ~/.julia/juliaup/julia-1.9.0+0.x64.linux.gnu/share/julia/stdlib/v1.9/Statistics/src/Statistics.jl:1070
@ Distributions ~/.julia/packages/Distributions/uREy8/src/univariates.jl:156
 top-level scope
@ REPL:1
``````
1 Like

Got it. Sorry. I misunderstood the nature of the problem.

Absolutely no worries:)

1 Like

To follow up, the solution was quite simple as soon as I figured out the failure point. Basically, I needed to define a mixed multivariate type and implement `DynampicPPL.vectorize(d::MixedMultivariateDistribution, r::NamedTuple) = [r...]`. A simple working example can be found below for anyone who is interested.

Summary
``````using Distributions
using Turing
using Random
import Distributions: logpdf
import Distributions: loglikelihood
import Distributions: rand
import DynamicPPL: vectorize
import Base: length

abstract type Mixed <: ValueSupport end

const MixedMultivariateDistribution = Distribution{Multivariate, Mixed}

abstract type SSM1D <: ContinuousUnivariateDistribution end

abstract type SSM2D <: MixedMultivariateDistribution end

struct MyType{T<:Real} <: SSM2D
μ::T
σ::T
end

vectorize(d::MixedMultivariateDistribution, r::NamedTuple) = [r...]

Base.length(d::MixedMultivariateDistribution) = 2

rand(d::MixedMultivariateDistribution) = rand(Random.default_rng(), d)
rand(d::MixedMultivariateDistribution, n::Int) = rand(Random.default_rng(), d, n)

function rand(rng::AbstractRNG, d::MyType)
choice = rand(1:2)
rt = rand(rng, LogNormal(d.μ, d.σ))
return (;choice, rt)
end

function rand(rng::AbstractRNG, d::MyType, N::Int)
choice = fill(0, N)
rt = fill(0.0, N)
for i in 1:N
choice[i],rt[i] = rand(rng, d)
end
return (choice=choice,rt=rt)
end

function logpdf(d::MyType, choice::Int, rt::Float64)
return logpdf(LogNormal(d.μ, d.σ), rt)
end

logpdf(d::MyType, data::NamedTuple) = logpdf(d::MyType, data.choice, data.rt)

loglikelihood(d::MyType, data::NamedTuple) = sum(logpdf.(d, data...))

@model function my_model(data)
μ ~ Normal(0, 1)
σ ~ truncated(Normal(0, 1), 0, Inf)
data ~ MyType(μ, σ)
return (;data, μ, σ)
end

data = rand(MyType(0, 1), 10)
chain = sample(my_model(data), NUTS(), 1_000)

predictions = predict(my_model(missing), chain)
``````

@DominiqueMakowski, can you please mark this as the solution? Thanks!

2 Likes