Tips for large-ish latent variable model in Turing

Hi - I have a data set (~900 x 120) on which I’m trying to do dimension reduction. Eventually, I want to test some factor mixture models, but right now, I simply want to do a basic factor analysis. The data set isn’t huge, but it’s large enough that I’m not expecting things to be fast. However, it’s even slower than I expect.

Here’s a MWE to give a sense of things, on 1.4.2, Windows 64-bit:

using Turing, Optim
data1 = randn(900, 120)

# p = the dimension of the factor model
# this is just an example; we ignore rotational invariance issues

@model factormodel(dat, p) = begin
    N, K = size(dat)

    σ ~ filldist(InverseGamma(1, 1), K)  # factor uniquenesses
    λ ~ filldist(Normal(0, 1), p, K)           # factor loadings
    μ ~ filldist(Normal(0, 10), K)             # variable-specific intercept
    F ~ filldist(Normal(0,1), N, p)           # latent factors

    mu = F * λ

    for k in 1:K
        for i in 1:N
            dat[i,k] ~ Normal(μ[k] + mu[i,k], σ[k])
        end
    end
end;

#=
(1) MLE/MAP estimates take an *extraordinarily* long time, especially compared to e.g., R's factanal() or fa() routines. I would expect it to take a few seconds, maybe. Instead, it takes so long I get tired of waiting and cancel. :( I'm not willing to wait overnight for a model like this, ha ha.
(2) standard NUTS estimation also takes a *very* long time (and seems to spend a lot of time generating values that are "rejected for numerical errors").
(3) basic MH is slightly slow, but usable (-ish).
(4) Going from ForwardDiff to a reverse-mode AD (like Zygote) does not change anything.
=#

# Trying a reverse diff backend like Zygote doesn't change anything.
# using Zygote
# Turing.setadbackend(:zygote)

map_estimate = optimize(factormodel(data1, 1), MAP())
hmc_estimate = sample(factormodel(data1, 1), NUTS(0.65), 2000)
mh_estimate = sample(factormodel(data1, 1), MH(), 20000)

# Also tested ADVI on master, but gives errors
# advi_estimate = vi(factormodel(data1, 1), ADVI(10, 1000))

I tested on both the current stable Turing (0.13.0) and master, with same results. I certainly don’t expect this kind of model to be fast, but I would expect it to run in a feasible time. A simple one-factor model only has 900+240 parameters, which is a good number, but not outlandish.

Is there something obvious I’m missing or doing very wrong? I have read the manual’s section on optimization, and I know the one loop isn’t ideal, but I’m not sure what to replace it with, or if it’s the real culprit here. I’d welcome any advice people have.

I would use LazyArrays and arraydist for that last line to vectorize and fuse everything including the AD. See https://github.com/TuringLang/TuringExamples/blob/290a2c6dcd711bdd755aafb70b8f1441518b64e4/benchmarks/logistic_reg/turing.jl for an example. Loops are generally fast in Julia, but they are still slow in reverse-mode AD. Forward-mode AD is slow for large models with more than a few hundred unobserved random variables. So if you want to make reverse-mode AD fast in Turing or Julia in general, you still need to figure out vectorization and fusing tricks. I find the fact that LazyArrays and arraydist seamlessly work together to be a powerful trick to help with that. I would also suggest you use ReverseDiff if it works as the AD package of choice. Zygote is excellent but typically has more performance issues.

1 Like

Thanks! I’ll give it a try.

Following up on this: by replacing the nested loops using LazyArrays / arraydisttrick, I got a ~25x speedup for Metropolis-Hastings sampling:

@model factormodel1(dat, p) = begin
    N, K = size(dat)

    σ ~ filldist(InverseGamma(1, 1), K)  # factor uniquenesses
    λ ~ filldist(Normal(0, 1), p, K)           # factor loadings
    μ ~ filldist(Normal(0, 10), K)             # variable-specific intercept
    F ~ filldist(Normal(0,1), N, p)           # latent factors

    mu = F * λ .+ μ'

    dat ~ arraydist(LazyArray(Base.broadcasted((m, s) -> Normal(m, s), mu, σ')))
end
@time mh_estimate = sample(factormodel(data1, 1), MH(), 200) # ~20 s
@time mh_estimate1 = sample(factormodel1(data1, 1), MH(), 200) # ~0.8 s

Most of the speedup comes from arraydist rather than LazyArrays: when I replaced the last line of the factormodel1 with

dat ~ arraydist([Normal(mu[i, k], σ[k]) for i in 1:N, k in 1:K])

it was only about 10% slower than with the arraydist(LazyArray(...)) approach. Of course, MH doesn’t explore the high-dimensional parameter space well at all, so in practice it produces barely any useful samples.

Unfortunately, sampling with NUTS is still extremely slow and produces a lot of “numerical errors” warnings. Drawing just 20 samples using NUTS took 291 seconds on my computer. However, I was able to get decent run times and sample sizes by playing around with different Gibbs schemes…the following ran in 197 seconds and appeared to converge after about 50 samples:

@time gibbs_estimate = sample(factormodel1(data1, 1), 
    Gibbs(NUTS(100, 0.35, :σ), PG(30, :λ), NUTS(100, 0.35, :μ), PG(30, :F)), 
    200)

This was arrived at by trial and error, so there are probably more efficient ways to do it.

EDIT: hit post too soon