Performance issues sampling a multimomial logit (slightly modified) with Turing.jl

Hi all,

I’m essentially trying to sample from a Bayesian multinomial logit with a few complexities: choices have characteristics and choices might not be available in all time periods (no time series here). Choice probabilities look sth like s_jt = softmax(d + X * beta) adjusted for availability. X is the feature matrix of size J products * L features, which includes a constant so I use priors:

  • for variance of d: Omega = LKJ( J, 2.0), thus d ~ MvNormal(0.0, Omega)
  • for beta: beta ~ MvNormal(ones(L), diagm(10 * ones(L)))

I then loop over time periods to compute the choice probabilities with sth as such:

for t in 1:T
    observed_qjt[:, t] ~ Multinomial(mkt_size, sjt)
end

The code seems to work in the sense that I don’t get any errors and the sampling procedure begins, but it stays stuck at 0% even when I reduce the number of time periods. To be precise about dimensions of the problem, I have J = 45 choices, T = 1024 time periods and L = 4 features.

I have sth that’s close to a MWE if you need but the matrices are quite big so not sure what’s the best way to share them.

mAV = BitMatrix(vcat(data[:mAV], ones(1, size(data[:mAV], 2))))[:, mkts]

mX = Matrix{Float64}(data[:mX])

mQ = Matrix{Int64}(round.(data[:mQ]))[:, mkts]

# Priors
J = size(mX, 1)
T = size(mQ, 2)
β₀ = ones(size(mX, 2))
Vᵦ = diagm(10 * β₀)

# Settings for the Hamiltonian Monte Carlo HMC sampler
iterations = 1_000 #Number of samples
ϵ = 0.1
τ = 5

@model function estimateBayesSharesDiv_logit(mAV, mX, mQ, J, T, β₀, Vᵦ)

    # Instantiate intermediate objects
    δ = Matrix{Real}(undef, J, T)
    
    # Priors
    Ω_prod ~ LKJ(J, 2.0) # on δⱼ
    β ~ MvNormal(β₀, Vᵦ) # L chars

    # Define parameters
    δ .~ MvNormal(zeros(J), Symmetric(Ω_prod))

    # Market specific probabilities
    for t in 1:T
        # Extract availability
        av = mAV[:, t]

        # Instantiate data holders
        u_jt = zeros(eltype(mX[1][1] * β), J+1) 
        s_jt = zeros(eltype(mX[1][1] * β), J+1)

        # Compute utility
        u_jt[1:J] = mX * β + δ[:, t]
        u_jt[end] = 0.0

        # Compute choice probabilities for available prods
        s_jt[av] = softmax(u_jt[av])

        # Data ~ Model
        mkt_size = Int(round(sum(mQ[:, t])))
        mQ[:, t] ~ Multinomial(mkt_size, s_jt)
    end

    return mQ
end

# Sampling
# chn = sample(estimateBayesSharesDiv_logit(mAV, mX, mQ, J, T, β₀, Vᵦ), HMC(ϵ, τ), iterations)
chn = sample(estimateBayesSharesDiv_logit(mAV, mX, mQ, J, T, β₀, Vᵦ), NUTS(0.65), iterations)

Here is a short MWE, with dimensions way smaller, everything works fine it seems. Could it really be just an issue of dimensionality? An MLE of this model in the frequentists sense takes less than a minute on the full dataset.

using Turing
using Distributions
using LinearAlgebra
using NNlib: softmax

# Data
mAV = BitMatrix([0 1 1; 1 1 0; 1 1 1]) # availability of prod j at time t

mX = Matrix{Float64}([1.0 0.0; 1.0 0.126701]) # features

mQ = Matrix{Int64}([0 18 68; 1 53 0; 223 1524 1358]) # observed choices

# Priors
J = size(mX, 1)
T = size(mQ, 2)
β₀ = ones(size(mX, 2))
Vᵦ = diagm(10 * β₀)

# Settings for the Hamiltonian Monte Carlo HMC sampler
iterations = 1_000 #Number of samples
ϵ = 0.1
τ = 5

@model function estimateBayesSharesDiv_logit(mAV, mX, mQ, J, T, β₀, Vᵦ)

    # Instantiate intermediate objects
    δ = Matrix{Real}(undef, J, T)

    # Priors
    Ω_prod ~ LKJ(J, 2.0) # on δⱼ
    β ~ MvNormal(β₀, Vᵦ) # L chars

    # Define parameters
    δ .~ MvNormal(zeros(J), Symmetric(Ω_prod))

    # Market specific probabilities
    for t in 1:T
        # Extract availability
        av = mAV[:, t]

        # Instantiate data holders
        u_jt = zeros(eltype(mX[1][1] * β), J + 1)
        s_jt = zeros(eltype(mX[1][1] * β), J + 1)

        # Compute utility
        u_jt[1:J] = mX * β + δ[:, t]
        u_jt[end] = 0.0

        # Compute choice probabilities for available prods
        s_jt[av] = softmax(u_jt[av])

        # Data ~ Model
        mkt_size = Int(round(sum(mQ[:, t])))
        mQ[:, t] ~ Multinomial(mkt_size, s_jt)
    end

    return mQ
end

# Sampling
# chn = sample(estimateBayesSharesDiv_logit(mAV, mX, mQ, J, T, β₀, Vᵦ), HMC(ϵ, τ), iterations)
chn = sample(estimateBayesSharesDiv_logit(mAV, mX, mQ, J, T, β₀, Vᵦ), NUTS(0.65), iterations)

Looks like it is - can you check the scaling behaviour (i.e. increase each dimension by 1, 2, 3… or something sensible along those lines)?

Sure, here goes:

  • J = 2, T = 3, L = 2 takes around 20-30s for 1000 iterations
  • J = 8, T = 3, L = 2 takes around 670s for 1000 iterations
  • J = 2, T = 9, L = 2 takes around 190s for 1000 iterations
  • J = 2, T = 3, L = 4 takes around 70s for 1000 iterations
  • J = 8, T = 9, L = 4 takes around 3400s for 1000 iterations

Seems to me that J and T are the largest bottlenecks, which is a shame since these are the two which are going to be scaled the most in the whole dataset, J would be =45 and T=1024…

Okay looks like a standard performance issue then - I know hoffe posted on slack already, which is where most of the people that know how to speed up Turing models hang out so maybe share this there as well and see if anyone has performance tips. Sorry I can’t be of more help but I’ve never really used Turing and while I’ve seen dozens of Turing performance discussions the solutions always seem a bit magic to me…

1 Like

Right, thanks anyways @nilshg !

Found sth somewhat related about how to define priors for intercepts in such models. Applying this to the deltas in my model has sped up the whole thing a bit (nothing close to being usable at the dimensions of the full problem yet). Here’s the link, this post might make me switch to Stan as well to compare.