Hi all, I am new to Mamba package, I know that it has automated paralleling for different “chains” but as my data set is a huge one, I would like to distribute the iterations on different processors. Is it possible?
Welcome! I don’t know if this is possible in Mamba, but it is in Turing:
using Distributed addprocs() @everywhere using Distributions @everywhere using Turing m = 100 nn = [rand(3:100) for i in 1:m] data = [randn(n) for n in nn] # a ragged array of data @model function RaggedParallel(data) μ ~ Normal(0, 5) σ ~ Gamma(2, 2) logliks = pmap(x -> loglikelihood(Normal(μ, σ), x), data) Turing.@addlogprob! sum(logliks) end mod = RaggedParallel(data) chn = sample(mod, PG(10), 100) using StatsPlots plot(chn)
Unfortunately it doesn’t seem to work with the HMC-based samplers, but for a simple model like this particle Gibbs (
PG) works ok…
Thanks @ElOceanografo, I haven’t tried Turing, But it seems a good package for running mcmc.
This is my model: in Mamba
y = Stochastic(2,
(pop, time, alpha, x, beta) ->
p = invlogit(alpha * x[i, j]+beta);
Bernoulli§) for i in 1:time, j in 1:pop
Which is a hierarchical
logistic regression model, Y is binary and X is covariates.
Do you think I can apply Turing in this case?
I’m not 100% sure what your model is–the code snippet you posted seems to only have one level. Do you think you could make up a minimum working example? That said, it is definitely possible to do a (hierarchical) logistic regression in Turing. Here’s a basic example showing one way to do it:
using Turing using Distributions # define logistic (invlogit) function logistic(x) = 1 / (1 + exp(-x)) # make up some fake parameters and data npop = 2 ntime = 10 alpha_hyper = Normal(1, 2) beta_hyper = Normal(0, 0.4) alpha = rand(alpha_hyper, 1, 2) beta = rand(beta_hyper, 1, 2) x = randn(ntime, npop) p = logistic.(alpha .* x .+ beta) y = rand.(Bernoulli.(p)) @model function hierarchical_logistic_reg(x, y) # priors on hyperparameters mu_alpha ~ Normal(0, 5) sigma_alpha ~ Gamma(2, 3) mu_beta ~ Normal(0, 5) sigma_beta ~ Gamma(2, 3) # regression parameters alpha ~ Normal(mu_alpha, sigma_alpha) beta ~ Normal(mu_beta, sigma_beta) p = logistic.(alpha .* x .+ beta) # observation likelihood y ~ arraydist(Bernoulli.(p)) end model = hierarchical_logistic_reg(x, y) chn = sample(model, NUTS(), 1000) using StatsPlots plot(chn)
If you’re just starting out with Bayesian modeling in Julia, I’d probably recommend using Turing–it’s a more recent and more actively developed library at this point than Mamba.
One other point about doing distributed computations inside the model: double-check that it’s actually giving you a speedup compared to serial computation. For the example I gave above, the overhead of doing the parallel likelihood calculations actually make that approach slower than a simpler serial version.