Condition Turing model on a variable created within the model

Assuming I have the following variable [0, 1]

using Downloads, CSV, DataFrames, Random
using Turing, Distributions
using GLMakie
using StatsFuns: logistic

y = vcat(rand(Beta(2, 10), 200),
    rand(Uniform(0, 1), 100),
    rand(Beta(10, 2), 200))
hist(y)

image

I would like to know if it was possible, within a Turing model, to transform this variable (dichotomize it to 0 and 1 if y > 0.5), and then fit a logistic model to this dichotomized variable.

I tried the following:

@model function model_choice(y)
    # Prior on probability
    choice_intercept ~ Normal(0, 1)

    for i in 1:length(y)
        # Logistic function
        v = logistic(choice_intercept)

        # Transform to binary choice
        choice = ifelse(y[i] > 0.5, 1, 0)

        # Likelihood    
        choice ~ Bernoulli(v)
    end
end

fit = model_choice(y)
chains = sample(fit, NUTS(), 400)

While this successfully samples:

Summary Statistics
        parameters      mean       std      mcse   ess_bulk   ess_tail      rhat   ess_per_sec 
            Symbol   Float64   Float64   Float64    Float64    Float64   Float64       Float64

  choice_intercept   -0.5054    0.0000    0.0000     1.4125    20.8078    2.0369        0.1158
            choice    1.0000    0.0000       NaN        NaN        NaN       NaN           NaN

I am not sure I am doing this correctly as 1) choice shows up in the results as a parameter and it probably shouldn’t be there; and 2) I can’t use predict() on that model:

pred = predict(model_choice([(missing) for i in 1:length(y)]), chains)
ERROR: MethodError: no method matching ifelse(::Missing, ::Int64, ::Int64)

Closest candidates are:
  ifelse(::Bool, ::Any, ::Any)
   @ Base essentials.jl:647

I am assuming I am not doing this correctly. What is the best way to do that? Thanks for any pointers!

This fails because missing is passed to ifelse which is not defined:

ifelse(missing, 0, 1)
ERROR: MethodError: no method matching ifelse(::Missing, ::Int64, ::Int64)

Since your model does not depend on y but only choice it’s probably easiest to pass it directly.

using Distributions, Turing

y = vcat(rand(Beta(2, 10), 200),
    rand(Uniform(0, 1), 100),
    rand(Beta(10, 2), 200))

choice = y .> 0.5

@model function choice_model(choice)
    choice_intercept ~ Normal(0, 1)
    
    for i in eachindex(choice)
        choice[i] ~ BernoulliLogit(choice_intercept)
    end
end

fit = choice_model(choice)
chains = sample(fit, NUTS(), 400)
Summary Statistics
        parameters      mean       std      mcse   ess_bulk   ess_tail      rhat   ess_per_sec 
            Symbol   Float64   Float64   Float64    Float64    Float64   Float64       Float64 

  choice_intercept   -0.0185    0.0905    0.0063   211.6200   194.0690    1.0137       25.6292

Quantiles
        parameters      2.5%     25.0%     50.0%     75.0%     97.5% 
            Symbol   Float64   Float64   Float64   Float64   Float64 

  choice_intercept   -0.1860   -0.0812   -0.0138    0.0509    0.1486
pred = predict(choice_model([missing for _ in 1:3]), chains)
Summary Statistics
  parameters      mean       std      mcse   ess_bulk   ess_tail      rhat   ess_per_sec 
      Symbol   Float64   Float64   Float64    Float64    Float64   Float64       Missing 

   choice[1]    0.5000    0.5006    0.0244   420.1271        NaN       NaN       missing
   choice[2]    0.5200    0.5002    0.0272   337.7291        NaN    1.0011       missing
   choice[3]    0.5175    0.5003    0.0251   397.3654        NaN    1.0017       missing

Quantiles
  parameters      2.5%     25.0%     50.0%     75.0%     97.5% 
      Symbol   Float64   Float64   Float64   Float64   Float64 

   choice[1]    0.0000    0.0000    0.5000    1.0000    1.0000
   choice[2]    0.0000    0.0000    1.0000    1.0000    1.0000
   choice[3]    0.0000    0.0000    1.0000    1.0000    1.0000

TLDR; How to, in a model predicting y, also estimate another parameter based on a derivative (a transformation done in the model) of y

Thanks! Unfortunately this is just a minimal working example, but in reality the model is slightly more complex, and creating a variable to condition (the “choice” here) inside the model would be necessary.

Regardless of the predict() method not working (as this might be caused by the model not returning anything) - which is a somewhat secondary issue, how to correctly define the variable inside, as the version below is probably wrong, as evidenced by the presence of the choice parameter in the output and very high and nonsensical parameter values for choice_intercept

In the following model, I also condition some other parameters on the outcome variables.

using Downloads, CSV, DataFrames, Random
using Turing, Distributions
using GLMakie
using StatsFuns: logistic

y = vcat(rand(Beta(2, 10), 200),
    rand(Uniform(0, 1), 100),
    rand(Beta(10, 2), 200))

@model function model_choice(y)
    choice_intercept ~ Normal(0, 1)
    μ ~ Normal(0.5, 0.2)
    σ ~ truncated(Normal(0, 1); lower=0)

    for i in 1:length(y)
        # Transform to binary choice
        choice = ifelse(y[i] > 0.5, 1, 0)
        # Likelihood    
        choice ~ BernoulliLogit(choice_intercept)
        y[i] ~ Normal(μ, σ)
    end
end

fit = model_choice(y)
chains = sample(fit, NUTS(), 400)

But the convergence is bad…

The proper way to use transformed data is to either transform it outside the model and pass it as an argument or for simple transformations like this code the transformation as the default value for the function argument:

@model function model_choice(y, choice = y .> 0.5)
    choice_intercept ~ Normal(0, 1)
    μ ~ Normal(0.5, 0.2)
    σ ~ truncated(Normal(0, 1); lower=0)

    for i in eachindex(y)
        choice[i] ~ BernoulliLogit(choice_intercept)
        y[i] ~ Normal(μ, σ)
    end
end

fit = model_choice(y)
chains = sample(fit, NUTS(), 400)

Whatever your approach, choice must be a part of the model function definition. Otherwise Turing.jl will treat it as a parameter and it will show up in your chains object.

Got it! Thanks :slight_smile: