Avoiding loops in Turing

I’m learning Turing. Can anyone help explain the following, I define two (hopefully) equivalent models one with loops and one without, the loop model is 50x slower, but actually produces useful results. This is a reduced version of my real model, two class mixture model where is class is described by a small number of 1D Gaussians:

@model function modelloop(x, configs, σ, ::Type{V} = Int64) where {V}
    
    n = length(x)
    w = [0.5, 0.5] 
    p = ones(5)/5

    class  = Vector{V}(undef, n)
    pos    = Vector{V}(undef, n)
    
    for i = 1:n
        class[i] ~ Categorical(w)
        pos[i]   ~ Categorical(p)
        x[i]     ~ Normal(configs[class[i]][pos[i]], σ)
    end
end

@model function modelvec(x, configs, σ) 
    
    n = length(x)
    w = [0.5, 0.5] 
    p = ones(5)/5

    class ~ filldist(Categorical(w), n)
    pos   ~ filldist(Categorical(p), n)
    x     ~ arraydist(map((c, l) -> Normal(configs[c][l], σ), class, pos))
end
### data
n = 200
configs = [-820:180:-100, 280:180:1000]
class   = rand(1:2, n);
pos     = rand(1:5, n);
x = [rand(Normal(configs[c][p], 5)) for (c, p) in zip(class, pos)];

ml = modelloop(x, configs, 5)
mv = modelvec(x, configs, 5)

@time chain_loop = sample(ml, PG(5), 100);  # 205 seconds
@time chain_vec  = sample(mv, PG(5), 100);  # 4 seconds!

counter(chain_loop.value.data[:, 1]), counter(chain_vec.value.data[:, 1])
(Accumulator(2.0 => 4, 1.0 => 96), Accumulator(2.0 => 44, 1.0 => 56))

In this case class 1 is correct for the first data point - modelloop is producing good results, modelvec appears to generate samples biased toward the correct class but with weight spread between the two.

Seems like I must be doing something stupid.

I’m not sure what might be wrong here. The speed up, which is even more dramatic on my machine, leads me to wonder if there is an error.

However, you might be able to marginalize out class and pos to obtain better performance. See this example.

Thanks for the suggestion. This is part of a larger model of biological data where it is relevant to retain information on class and pos, however, I’ll explore removing them.

It does seem like there might be an error, to reiterate modelvec is not converging anything like modelloop:

julia> summarystats(chain_vec)
Summary Statistics
  parameters      mean       std   naive_se      mcse        ess      rhat 
      Symbol   Float64   Float64    Float64   Missing    Float64   Float64 

    class[1]    1.5200    0.5021     0.0502   missing    90.3119    0.9899
    class[2]    1.3700    0.4852     0.0485   missing    72.9221    1.0110
    class[3]    1.5500    0.5000     0.0500   missing    68.1771    1.0229
    class[4]    1.5300    0.5016     0.0502   missing   131.6298    0.9935
    class[5]    1.4100    0.4943     0.0494   missing    90.2041    1.0408


julia> summarystats(chain_loop)
Summary Statistics
  parameters      mean       std   naive_se      mcse        ess      rhat 
      Symbol   Float64   Float64    Float64   Missing    Float64   Float64 

    class[1]    1.9500    0.2190     0.0219   missing   112.5047    1.0089
    class[2]    1.0000    0.0000     0.0000   missing        NaN       NaN
    class[3]    1.9300    0.2564     0.0256   missing    60.6572    1.0037
    class[4]    1.9800    0.1407     0.0141   missing    50.5572    1.0104
    class[5]    1.0400    0.1969     0.0197   missing   114.6199    0.9899

Compare means and std for the first 5 class labels.

Can you confirm the vector model is actually wrong? Or is it just that the vector model is doing more label switching compared to the loop? A visual inspection of an output plot might make that clear. Label switching is very annoying to deal with, but is often considered a sign your mixture model is doing a better job of exploring the full parameter space.

It’s still odd that there’s a difference between the two models, of course.

Oh, also, I’ve been doing a lot of Turing benchmarking lately, trying to find a way to make one of my models viable. This isn’t related to your problem, but I’ve decided I prefer to run a model once (with a single sample), before doing the “real” timing. It can pretty dramatically change not just the absolute timings you see (which is expected), but sometimes the relative timings, too.

For example, when I first ran your models, I got 105s/4.5s, respectively. Second and third runs were ~ 95s/0.12s. So the speedup on my machine went from 23x on the first run to ~800x on the subsequent runs.

For this same reason, when just running my model “for real” and not benchmarking, I always do a sample with a single sample to shake things out. It makes a massive practical difference in speed.

To answer my own question above, looking at the actual chains and more closely at the model, it’s not label switching (nor could it be, since the distributions have been fixed a priori).

If it helps debug things in any way, a slightly different version of the model breaks in a somewhat different way than the completely vectorized version above:

@model function modelvec(x, configs, σ)
    
    n = length(x)
    w = [0.5, 0.5] 
    p = ones(5)/5

    class ~ filldist(Categorical(w), n)
    pos   ~ filldist(Categorical(p), n)

    for i in 1:n
        x[i] ~ Normal(configs[class[i]][pos[i]], σ)
    end
end

When I bring the class and pos variables out of the loop, but leave the data likelihood term, the model always picks out the correct class for the first observation, but not for any of the others. E.g.,

julia> [mean(Array(chain_vec)[:,i] .== class[i]) for i in 1:20]
20-element Vector{Float64}:
 0.97
 0.521
 0.478
 0.501
 0.519
 0.485

This is in fact a bug! Thank you for pointing this out!
I’ve created an issue here: https://github.com/TuringLang/Turing.jl/issues/1592.

2 Likes

Thanks all for the suggestions and investigation. Good to know that this is a bug and that vectorization does not work with particle samplers.

(deleted a previous post, as asked additional question about the chain and just realised I was accidentally plotting the logevidence rather than my variable)

1 Like