just wondering how safe it is to use Threads.@threads for loops within turing models e.g.
@model function my_func(Y)
alpha ~ Normal(0,1)
sigma ~ Normal(0,1)
Threads.@threads for i in 1:size(Y)[2]
Y[:,j] .~ Normal(alpha,sigma)
end
end
This seems to work on my laptop. But I don’t currently have more than 1 thread available to me, so I can’t test it out properly. Is there any reason to avoid this in Turing?
Would also be nice to get a general view of how nicely Turing plays with parallel and distributed precessing within models. For instance, I’m working on a bayesian neural network using Turing and Flux. And it would be nice to utilise the GPU.
Getting Turing working smoothly with the GPU, however, is not so nice and currently a big issue. If you are only interested in using Turing for a BNN, then I would recommend to write the log joint yourself and use AdvancedVI or AdvancedHMC directly. Then using the GPU should work (more or less). This is how I’m doing this atm but I haven’t actually used the GPU with HMC for now. So I would be a bit cautious with this as you might run into so weird problems.
@Kai_Xu will be able to say more about the HMC on GPU stuff.
Turing model is thread-safe. Even with laptop you should have more than one thread. Did you set the envirment variable (Multi-Threading · The Julia Language) correctly to let Julia use more than one thread?
Following Martin’s point, if you write your own log-density (computed by a Flux model), you can use the static HMC methods in AHMC, either on CPU or GPU. Vectorization is also supported as to run multiple chains in parallell, but you need to make sure your Flux model is coded to work with vectorization.
Thank you both. I will have a go with my own log density. How silly, I wasn’t setting the threading environment variable when using Julia on my laptop. The covid19 model is a useful example for moving beyond the basics presented in the tutorials.
Thanks for all your work on Turing. It really is brilliant!
I don’t think this should be so hard in Soss, and I’d love to have a nice example to dig into. Ideal would be something that works well with Flux for a point estimate, but currently requires some hand-tuning for a BNN. Any suggestions?
Just a BNN will do. The main issue why this is nontrivial in Turing is that 1) the bookkeeping in Turing currently would need some refactoring, 2) you would need to build some optimised computation graph out of the model to reduce memory mapping between CPU and GPU. KNet.jl is doing this but only for deep nets. Flux is in my understanding (which is probably outdated) still not very performant on the GPU because of this issue. But maybe this has changed.