Posterior prediction in Turing

I’m looking to replicate the below code snippet from a Stan example. I’m having a hard time figuring out the syntax to replicate the Turing predict() example for my logistic regression model and translating the Stan code to Julia/Turing. The issue seems to be translating the blank dependent, y, variable from the linear regression example into an equivalent for my logistic case. If I simply use, y = Vector{Union{Missing, Int64}}(undef, n), converting the Float64 into Int64 from the example to match the binary 0/1 for a logistic model, it gives me a MethodError because there isn’t a Bernoulli model for ````Union{Missing, Int64}‘’'. I tested the linear model example seems to work on my version of Turing.

# Posterior_epred returns the posterior estimates for the different subgroups stored in the
# poststrat_df dataframe. 
epred_mat <- posterior_epred(fit, newdata = poststrat_df, draws = 1000)
mrp_estimates_vector <- epred_mat %*% poststrat_df$n / sum(poststrat_df$n)
mrp_estimate <- c(mean = mean(mrp_estimates_vector), sd = sd(mrp_estimates_vector))
cat("MRP estimate mean, sd: ", round(mrp_estimate, 3))

I’m working on the predict() example in Inference.jl, with key code from the example:

m_test = linear_reg(xs_test, Vector{Union{Missing, Float64}}(undef, length(ys_test)), σ);
predictions = predict(m_test, chain_lin_reg)
ys_pred = vec(mean(Array(group(predictions, :y)); dims = 1));

My Julia/Turing code to reproduce the Stan snippet is:

y = Vector{Union{Missing, Int64}}(undef, n) # This doesn't work (MethodError) as y but is based on Turing example in documentation for Inference.jl
y = ones(n) # This works as y
post_model = varying_intercept(
        state_idx, eth_idx, male_eth_idx, educ_idx, educ_age_idx, educ_eth_idx, y, X, n, standardize = false
) # Model runs for training data set
epred_mat = predict(post_model, chains)

vec(mean(Array(group(epred_mat, :y)); dims = 1)) # Not the correct output

Can you post a runnable example of your problem? That would help others provide a solution.

Putting together a runnable example, I saw that the Turing case study for logistic regression has a prediction function I could adapt. The example can be run after installing RDatasets.jl: https://turing.ml/dev/tutorials/02-logistic-regression/

This gets me most of the way to replicating my Stan example. The output from the Stan function posterior_epred is an N \times M matrix where N is the number of chains and M is the total observations in the prediction data. A simplified version of my function that excludes variables that aren’t germane to the problem, based on the logistic example, is as follows:

function prediction(X::Matrix, chains;predictors=size(X, 2))
            
    # Compute the mean parameter value across chains/iterations
    inter = DataFrame(mean(group(chains,:inter)))[!, :mean]
    # Select parameter values for betas
    β = DataFrame(mean(group(chains,:β)))[!,:mean]

    v = inter .+ X * β

     return logistic.(v)
end;

I call this with:

sub_chains = chains[1001:end]
post_model = prediction(X, sub_chains)

This give the correct return in terms of format, but it isn’t quite right. The issue is that mean (and its parent function summarize()) aggregates across chains and samples. I need the mean across chains only, so \beta and inter should both be 1000 \times 1 vectors and not single values. Then, logistic(v) would return a matrix of probabilities, or I would loop through samples to generate vectors of probabilities. My thought was there might be a way to adapt mean() to only aggregate across the chains and not samples, but I couldn’t find a way to do it in the source code. Any ideas (probably from the developers of that function)?

I’m not exactly sure if the following code does what you are looking for, but I thought I’d provide it just in case. I usually convert the chain to a DataFrame because I find it easier to work with. So, using the logistic regression tutorial, I would do something like this:

m = logistic_regression(train, train_label, n, 1)
chain = sample(m, NUTS(), MCMCThreads(), 1000, 4)
chaindf = DataFrame(chain)

This gives me a DataFrame that looks like this:

4000×18 DataFrame
  Row │ iteration  chain  intercept  student     balance  income      lp        n_steps  is_accept  acceptance_rate  log_density  hamiltonian_energy ⋯
      │ Int64      Int64  Float64    Float64     Float64  Float64     Float64   Float64  Float64    Float64          Float64      Float64            ⋯
──────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
    1 │       501      1   -4.78973   0.440116   1.8584    0.486572   -56.5809      7.0        1.0         0.962241     -56.5809             56.9433 ⋯
    2 │       502      1   -4.87119   0.811197   1.98672   0.385192   -58.3785      3.0        1.0         0.893347     -58.3785             58.8032  
    3 │       503      1   -4.14025  -1.30256    1.83855  -0.161454   -56.879      23.0        1.0         0.987357     -56.879              59.4473  
    4 │       504      1   -4.45475  -0.122842   1.97072   0.35409    -55.7659      7.0        1.0         0.956346     -55.7659             57.3812  
    5 │       505      1   -4.453    -0.654141   1.87193   0.693527   -57.6155      3.0        1.0         0.733881     -57.6155             58.1016 ⋯
    6 │       506      1   -4.01181  -0.810257   1.89205   0.419372   -57.1889      7.0        1.0         0.994691     -57.1889             57.7374  
  ⋮   │     ⋮        ⋮        ⋮          ⋮          ⋮         ⋮          ⋮         ⋮         ⋮             ⋮              ⋮               ⋮          ⋱
 3996 │      1496      4   -4.01397   0.274545   1.27778   0.783362   -59.1164      3.0        1.0         1.0          -59.1164             60.3322  
 3997 │      1497      4   -4.01407  -0.260389   1.92296   0.460915   -57.7695      7.0        1.0         0.698575     -57.7695             63.512   
 3998 │      1498      4   -4.19074  -0.636925   1.59508   0.471109   -56.8324      7.0        1.0         0.996946     -56.8324             58.3185 ⋯
 3999 │      1499      4   -4.66654  -0.347037   2.1733    0.467141   -56.4114      7.0        1.0         0.953491     -56.4114             58.7883  
 4000 │      1500      4   -4.84313  -0.751735   2.0547    0.0107546  -57.1793     15.0        1.0         0.921095     -57.1793             57.7937  
                                                                                                                       6 columns and 3989 rows omitted

Then, I would proceed with something like this:

p = [
    logistic(row.intercept + row.student * test[i, 1] + row.balance * test[i, 2] + row.income * test[i, 3])
    for row in eachrow(chaindf), i in 1:size(test, 1)
]

4000×9500 Matrix{Float64}:
 0.465484  0.00344039  0.00508513  0.00137203   0.261555  0.000450104  …  0.00421789  0.00069823   0.00792776  0.314582  0.0921833  0.013099
 0.488537  0.0032657   0.00365576  0.000998941  0.298431  0.000396564     0.00399184  0.000555768  0.00674533  0.476042  0.151211   0.0193158
 0.440289  0.0101318   0.00349736  0.00157082   0.400781  0.001867        0.0114167   0.0015384    0.0104211   0.229439  0.058257   0.00721661        
 0.574727  0.00507123  0.00532339  0.00150388   0.384724  0.00063604      0.00616453  0.000864896  0.0100408   0.35092   0.0966909  0.0118373
 0.611662  0.00417768  0.00978534  0.0022429    0.335778  0.000487701     0.00526206  0.000925815  0.0124358   0.150412  0.038205   0.00529938        
 0.654961  0.00772059  0.00966926  0.00271075   0.452906  0.00101429   …  0.00940061  0.00147208   0.0163597   0.248997  0.0660443  0.00869838        
 0.544748  0.00612234  0.00897271  0.00257814   0.335795  0.000884618     0.00743218  0.00134973   0.0135882   0.326597  0.103618   0.0164337
 0.351807  0.00370288  0.00620632  0.00185623   0.18436   0.000599739     0.00447022  0.000950212  0.00846804  0.154772  0.0469492  0.00833413        
 ⋮                                                        ⋮            ⋱              ⋮
 0.422075  0.00426458  0.0135892   0.00354346   0.187335  0.0006838       0.00527318  0.00140718   0.0133127   0.367753  0.15215    0.0358649
 0.509708  0.00680076  0.0312533   0.00795952   0.222816  0.00117337      0.00846254  0.00278392   0.0247064   0.145091  0.0568705  0.0149864
 0.419215  0.00726688  0.0258561   0.00754298   0.18959   0.0014455    …  0.0088408   0.00302761   0.0223092   0.145294  0.0598602  0.0168975
 0.679729  0.0074264   0.0100926   0.00269873   0.467505  0.000927589     0.0091118   0.00140119   0.016628    0.369742  0.109152   0.0145805
 0.465057  0.00682179  0.0107435   0.00336271   0.274343  0.00116335      0.00817516  0.00178612   0.014866    0.139324  0.0427343  0.00780718        
 0.659689  0.00358769  0.00450103  0.00105151   0.431728  0.000349499     0.00449446  0.000528723  0.00839054  0.325864  0.0756088  0.0073663
 0.431515  0.00421881  0.00197272  0.000694956  0.33773   0.00057689      0.00494311  0.000559974  0.00552086  0.254671  0.056975   0.00570127

Finally,

threshold = 0.07
predictions = [mean(c) ≥ threshold ? 1 : 0 for c in eachcol(p)]

9500-element Vector{Int64}:
 1
 0
 0
 0
 1
 0
 0
 0
 ⋮
 0
 0
 0
 0
 1
 1
 0
1 Like

Thanks! Recognizing that conversion to a DataFrame flattens the Array into 2D was the key. Looping over i,j pairs is fine. This is exactly what I needed to replicate the Stan implementation.

1 Like