Tips for large-ish latent variable model in Turing

opera_malenky · July 20, 2020, 5:10pm

Hi - I have a data set (~900 x 120) on which I’m trying to do dimension reduction. Eventually, I want to test some factor mixture models, but right now, I simply want to do a basic factor analysis. The data set isn’t huge, but it’s large enough that I’m not expecting things to be fast. However, it’s even slower than I expect.

Here’s a MWE to give a sense of things, on 1.4.2, Windows 64-bit:

using Turing, Optim
data1 = randn(900, 120)

# p = the dimension of the factor model
# this is just an example; we ignore rotational invariance issues

@model factormodel(dat, p) = begin
    N, K = size(dat)

    σ ~ filldist(InverseGamma(1, 1), K)  # factor uniquenesses
    λ ~ filldist(Normal(0, 1), p, K)           # factor loadings
    μ ~ filldist(Normal(0, 10), K)             # variable-specific intercept
    F ~ filldist(Normal(0,1), N, p)           # latent factors

    mu = F * λ

    for k in 1:K
        for i in 1:N
            dat[i,k] ~ Normal(μ[k] + mu[i,k], σ[k])
        end
    end
end;

#=
(1) MLE/MAP estimates take an *extraordinarily* long time, especially compared to e.g., R's factanal() or fa() routines. I would expect it to take a few seconds, maybe. Instead, it takes so long I get tired of waiting and cancel. :( I'm not willing to wait overnight for a model like this, ha ha.
(2) standard NUTS estimation also takes a *very* long time (and seems to spend a lot of time generating values that are "rejected for numerical errors").
(3) basic MH is slightly slow, but usable (-ish).
(4) Going from ForwardDiff to a reverse-mode AD (like Zygote) does not change anything.
=#

# Trying a reverse diff backend like Zygote doesn't change anything.
# using Zygote
# Turing.setadbackend(:zygote)

map_estimate = optimize(factormodel(data1, 1), MAP())
hmc_estimate = sample(factormodel(data1, 1), NUTS(0.65), 2000)
mh_estimate = sample(factormodel(data1, 1), MH(), 20000)

# Also tested ADVI on master, but gives errors
# advi_estimate = vi(factormodel(data1, 1), ADVI(10, 1000))

I tested on both the current stable Turing (0.13.0) and master, with same results. I certainly don’t expect this kind of model to be fast, but I would expect it to run in a feasible time. A simple one-factor model only has 900+240 parameters, which is a good number, but not outlandish.

Is there something obvious I’m missing or doing very wrong? I have read the manual’s section on optimization, and I know the one loop isn’t ideal, but I’m not sure what to replace it with, or if it’s the real culprit here. I’d welcome any advice people have.

mohamed82008 · July 20, 2020, 5:25pm

I would use LazyArrays and arraydist for that last line to vectorize and fuse everything including the AD. See https://github.com/TuringLang/TuringExamples/blob/290a2c6dcd711bdd755aafb70b8f1441518b64e4/benchmarks/logistic_reg/turing.jl for an example. Loops are generally fast in Julia, but they are still slow in reverse-mode AD. Forward-mode AD is slow for large models with more than a few hundred unobserved random variables. So if you want to make reverse-mode AD fast in Turing or Julia in general, you still need to figure out vectorization and fusing tricks. I find the fact that LazyArrays and arraydist seamlessly work together to be a powerful trick to help with that. I would also suggest you use ReverseDiff if it works as the AD package of choice. Zygote is excellent but typically has more performance issues.

opera_malenky · July 20, 2020, 6:02pm

Thanks! I’ll give it a try.

ElOceanografo · July 31, 2020, 7:12pm

Following up on this: by replacing the nested loops using LazyArrays / arraydisttrick, I got a ~25x speedup for Metropolis-Hastings sampling:

@model factormodel1(dat, p) = begin
    N, K = size(dat)

    σ ~ filldist(InverseGamma(1, 1), K)  # factor uniquenesses
    λ ~ filldist(Normal(0, 1), p, K)           # factor loadings
    μ ~ filldist(Normal(0, 10), K)             # variable-specific intercept
    F ~ filldist(Normal(0,1), N, p)           # latent factors

    mu = F * λ .+ μ'

    dat ~ arraydist(LazyArray(Base.broadcasted((m, s) -> Normal(m, s), mu, σ')))
end
@time mh_estimate = sample(factormodel(data1, 1), MH(), 200) # ~20 s
@time mh_estimate1 = sample(factormodel1(data1, 1), MH(), 200) # ~0.8 s

Most of the speedup comes from arraydist rather than LazyArrays: when I replaced the last line of the factormodel1 with

dat ~ arraydist([Normal(mu[i, k], σ[k]) for i in 1:N, k in 1:K])

it was only about 10% slower than with the arraydist(LazyArray(...)) approach. Of course, MH doesn’t explore the high-dimensional parameter space well at all, so in practice it produces barely any useful samples.

Unfortunately, sampling with NUTS is still extremely slow and produces a lot of “numerical errors” warnings. Drawing just 20 samples using NUTS took 291 seconds on my computer. However, I was able to get decent run times and sample sizes by playing around with different Gibbs schemes…the following ran in 197 seconds and appeared to converge after about 50 samples:

@time gibbs_estimate = sample(factormodel1(data1, 1), 
    Gibbs(NUTS(100, 0.35, :σ), PG(30, :λ), NUTS(100, 0.35, :μ), PG(30, :F)), 
    200)

This was arrived at by trial and error, so there are probably more efficient ways to do it.

EDIT: hit post too soon

Topic		Replies	Views
Making Turing Fast with large numbers of parameters? General Usage turing	127	9541	April 9, 2023
NUTS speed is very slow for high dimension parameter inference in Turing.jl Probabilistic programming turing	8	2329	May 13, 2022
Advice for simple (mildly large) Turing model Probabilistic programming question	9	2138	December 26, 2019
Turing: help with slow model Probabilistic programming	46	2631	February 13, 2021
Improving performance of item response model in Turing.jl Performance turing	23	2316	January 18, 2023

Tips for large-ish latent variable model in Turing

Related topics