Performance issues sampling a multimomial logit (slightly modified) with Turing.jl

sarkispa · November 23, 2022, 7:09pm

Hi all,

I’m essentially trying to sample from a Bayesian multinomial logit with a few complexities: choices have characteristics and choices might not be available in all time periods (no time series here). Choice probabilities look sth like s_jt = softmax(d + X * beta) adjusted for availability. X is the feature matrix of size J products * L features, which includes a constant so I use priors:

for variance of d: Omega = LKJ( J, 2.0), thus d ~ MvNormal(0.0, Omega)
for beta: beta ~ MvNormal(ones(L), diagm(10 * ones(L)))

I then loop over time periods to compute the choice probabilities with sth as such:

for t in 1:T
    observed_qjt[:, t] ~ Multinomial(mkt_size, sjt)
end

The code seems to work in the sense that I don’t get any errors and the sampling procedure begins, but it stays stuck at 0% even when I reduce the number of time periods. To be precise about dimensions of the problem, I have J = 45 choices, T = 1024 time periods and L = 4 features.

I have sth that’s close to a MWE if you need but the matrices are quite big so not sure what’s the best way to share them.

mAV = BitMatrix(vcat(data[:mAV], ones(1, size(data[:mAV], 2))))[:, mkts]

mX = Matrix{Float64}(data[:mX])

mQ = Matrix{Int64}(round.(data[:mQ]))[:, mkts]

# Priors
J = size(mX, 1)
T = size(mQ, 2)
β₀ = ones(size(mX, 2))
Vᵦ = diagm(10 * β₀)

# Settings for the Hamiltonian Monte Carlo HMC sampler
iterations = 1_000 #Number of samples
ϵ = 0.1
τ = 5

@model function estimateBayesSharesDiv_logit(mAV, mX, mQ, J, T, β₀, Vᵦ)

    # Instantiate intermediate objects
    δ = Matrix{Real}(undef, J, T)
    
    # Priors
    Ω_prod ~ LKJ(J, 2.0) # on δⱼ
    β ~ MvNormal(β₀, Vᵦ) # L chars

    # Define parameters
    δ .~ MvNormal(zeros(J), Symmetric(Ω_prod))

    # Market specific probabilities
    for t in 1:T
        # Extract availability
        av = mAV[:, t]

        # Instantiate data holders
        u_jt = zeros(eltype(mX[1][1] * β), J+1) 
        s_jt = zeros(eltype(mX[1][1] * β), J+1)

        # Compute utility
        u_jt[1:J] = mX * β + δ[:, t]
        u_jt[end] = 0.0

        # Compute choice probabilities for available prods
        s_jt[av] = softmax(u_jt[av])

        # Data ~ Model
        mkt_size = Int(round(sum(mQ[:, t])))
        mQ[:, t] ~ Multinomial(mkt_size, s_jt)
    end

    return mQ
end

# Sampling
# chn = sample(estimateBayesSharesDiv_logit(mAV, mX, mQ, J, T, β₀, Vᵦ), HMC(ϵ, τ), iterations)
chn = sample(estimateBayesSharesDiv_logit(mAV, mX, mQ, J, T, β₀, Vᵦ), NUTS(0.65), iterations)

sarkispa · November 24, 2022, 3:32pm

Here is a short MWE, with dimensions way smaller, everything works fine it seems. Could it really be just an issue of dimensionality? An MLE of this model in the frequentists sense takes less than a minute on the full dataset.

using Turing
using Distributions
using LinearAlgebra
using NNlib: softmax

# Data
mAV = BitMatrix([0 1 1; 1 1 0; 1 1 1]) # availability of prod j at time t

mX = Matrix{Float64}([1.0 0.0; 1.0 0.126701]) # features

mQ = Matrix{Int64}([0 18 68; 1 53 0; 223 1524 1358]) # observed choices

# Priors
J = size(mX, 1)
T = size(mQ, 2)
β₀ = ones(size(mX, 2))
Vᵦ = diagm(10 * β₀)

# Settings for the Hamiltonian Monte Carlo HMC sampler
iterations = 1_000 #Number of samples
ϵ = 0.1
τ = 5

@model function estimateBayesSharesDiv_logit(mAV, mX, mQ, J, T, β₀, Vᵦ)

    # Instantiate intermediate objects
    δ = Matrix{Real}(undef, J, T)

    # Priors
    Ω_prod ~ LKJ(J, 2.0) # on δⱼ
    β ~ MvNormal(β₀, Vᵦ) # L chars

    # Define parameters
    δ .~ MvNormal(zeros(J), Symmetric(Ω_prod))

    # Market specific probabilities
    for t in 1:T
        # Extract availability
        av = mAV[:, t]

        # Instantiate data holders
        u_jt = zeros(eltype(mX[1][1] * β), J + 1)
        s_jt = zeros(eltype(mX[1][1] * β), J + 1)

        # Compute utility
        u_jt[1:J] = mX * β + δ[:, t]
        u_jt[end] = 0.0

        # Compute choice probabilities for available prods
        s_jt[av] = softmax(u_jt[av])

        # Data ~ Model
        mkt_size = Int(round(sum(mQ[:, t])))
        mQ[:, t] ~ Multinomial(mkt_size, s_jt)
    end

    return mQ
end

# Sampling
# chn = sample(estimateBayesSharesDiv_logit(mAV, mX, mQ, J, T, β₀, Vᵦ), HMC(ϵ, τ), iterations)
chn = sample(estimateBayesSharesDiv_logit(mAV, mX, mQ, J, T, β₀, Vᵦ), NUTS(0.65), iterations)

nilshg · November 24, 2022, 4:31pm

Looks like it is - can you check the scaling behaviour (i.e. increase each dimension by 1, 2, 3… or something sensible along those lines)?

sarkispa · November 24, 2022, 6:16pm

Sure, here goes:

J = 2, T = 3, L = 2 takes around 20-30s for 1000 iterations
J = 8, T = 3, L = 2 takes around 670s for 1000 iterations
J = 2, T = 9, L = 2 takes around 190s for 1000 iterations
J = 2, T = 3, L = 4 takes around 70s for 1000 iterations
J = 8, T = 9, L = 4 takes around 3400s for 1000 iterations

Seems to me that J and T are the largest bottlenecks, which is a shame since these are the two which are going to be scaled the most in the whole dataset, J would be =45 and T=1024…

nilshg · November 24, 2022, 7:46pm

Okay looks like a standard performance issue then - I know hoffe posted on slack already, which is where most of the people that know how to speed up Turing models hang out so maybe share this there as well and see if anyone has performance tips. Sorry I can’t be of more help but I’ve never really used Turing and while I’ve seen dozens of Turing performance discussions the solutions always seem a bit magic to me…

sarkispa · November 25, 2022, 3:33pm

Right, thanks anyways @nilshg !

sarkispa · November 25, 2022, 3:36pm

Found sth somewhat related about how to define priors for intercepts in such models. Applying this to the deltas in my model has sped up the whole thing a bit (nothing close to being usable at the dimensions of the full problem yet). Here’s the link, this post might make me switch to Stan as well to compare.

Topic		Replies	Views
Bayesian logistic regression with Turing.jl Probabilistic Programming turing , monte-carlo	29	4492	May 18, 2021
Ordinal probit model with Turing - difficulty sampling Probabilistic Programming question	4	242	April 9, 2024
Speeding up MCMC for logistic regression with large datasets using Turing or some alternative Statistics question	2	693	March 12, 2020
Turing: help with slow model Probabilistic Programming	46	2638	February 13, 2021
Advice for simple (mildly large) Turing model Probabilistic Programming question	9	2138	December 26, 2019

Performance issues sampling a multimomial logit (slightly modified) with Turing.jl

Related topics