Hi all, I am new to Mamba package, I know that it has automated paralleling for different “chains” but as my data set is a huge one, I would like to distribute the iterations on different processors. Is it possible?
Welcome! I don’t know if this is possible in Mamba, but it is in Turing:
using Distributed
addprocs()
@everywhere using Distributions
@everywhere using Turing
m = 100
nn = [rand(3:100) for i in 1:m]
data = [randn(n) for n in nn] # a ragged array of data
@model function RaggedParallel(data)
μ ~ Normal(0, 5)
σ ~ Gamma(2, 2)
logliks = pmap(x -> loglikelihood(Normal(μ, σ), x), data)
Turing.@addlogprob! sum(logliks)
end
mod = RaggedParallel(data)
chn = sample(mod, PG(10), 100)
using StatsPlots
plot(chn)
Unfortunately it doesn’t seem to work with the HMC-based samplers, but for a simple model like this particle Gibbs (PG
) works ok…
Thanks @ElOceanografo, I haven’t tried Turing, But it seems a good package for running mcmc.
This is my model: in Mamba
y = Stochastic(2,
(pop, time, alpha, x, beta) →
UnivariateDistribution[(
p = invlogit(alpha * x[i, j]+beta);
Bernoulli(p)) for i in 1:time, j in 1:pop
],
false
),
Which is a hierarchical
logistic regression model, Y is binary and X is covariates.
Do you think I can apply Turing in this case?
I’m not 100% sure what your model is–the code snippet you posted seems to only have one level. Do you think you could make up a minimum working example? That said, it is definitely possible to do a (hierarchical) logistic regression in Turing. Here’s a basic example showing one way to do it:
using Turing
using Distributions
# define logistic (invlogit) function
logistic(x) = 1 / (1 + exp(-x))
# make up some fake parameters and data
npop = 2
ntime = 10
alpha_hyper = Normal(1, 2)
beta_hyper = Normal(0, 0.4)
alpha = rand(alpha_hyper, 1, 2)
beta = rand(beta_hyper, 1, 2)
x = randn(ntime, npop)
p = logistic.(alpha .* x .+ beta)
y = rand.(Bernoulli.(p))
@model function hierarchical_logistic_reg(x, y)
# priors on hyperparameters
mu_alpha ~ Normal(0, 5)
sigma_alpha ~ Gamma(2, 3)
mu_beta ~ Normal(0, 5)
sigma_beta ~ Gamma(2, 3)
# regression parameters
alpha ~ Normal(mu_alpha, sigma_alpha)
beta ~ Normal(mu_beta, sigma_beta)
p = logistic.(alpha .* x .+ beta)
# observation likelihood
y ~ arraydist(Bernoulli.(p))
end
model = hierarchical_logistic_reg(x, y)
chn = sample(model, NUTS(), 1000)
using StatsPlots
plot(chn)
If you’re just starting out with Bayesian modeling in Julia, I’d probably recommend using Turing–it’s a more recent and more actively developed library at this point than Mamba.
One other point about doing distributed computations inside the model: double-check that it’s actually giving you a speedup compared to serial computation. For the example I gave above, the overhead of doing the parallel likelihood calculations actually make that approach slower than a simpler serial version.