How do I distribute Mamba mcmc function iterations on different processors?

Hi all, I am new to Mamba package, I know that it has automated paralleling for different “chains” but as my data set is a huge one, I would like to distribute the iterations on different processors. Is it possible?

Welcome! I don’t know if this is possible in Mamba, but it is in Turing:

using Distributed
addprocs()
@everywhere using Distributions
@everywhere using Turing
    
m = 100
nn = [rand(3:100) for i in 1:m]
data = [randn(n) for n in nn] # a ragged array of data

@model function RaggedParallel(data)
    μ ~ Normal(0, 5)
    σ ~ Gamma(2, 2)
    logliks = pmap(x -> loglikelihood(Normal(μ, σ), x), data)
    Turing.@addlogprob! sum(logliks)
end

mod = RaggedParallel(data)
chn = sample(mod, PG(10), 100)

using StatsPlots
plot(chn)

Unfortunately it doesn’t seem to work with the HMC-based samplers, but for a simple model like this particle Gibbs (PG) works ok…

Thanks @ElOceanografo, I haven’t tried Turing, But it seems a good package for running mcmc.

This is my model: in Mamba

y = Stochastic(2,
(pop, time, alpha, x, beta) →
UnivariateDistribution[(
p = invlogit(alpha * x[i, j]+beta);
Bernoulli(p)) for i in 1:time, j in 1:pop
],
false
),

Which is a hierarchical
logistic regression model, Y is binary and X is covariates.
Do you think I can apply Turing in this case?

I’m not 100% sure what your model is–the code snippet you posted seems to only have one level. Do you think you could make up a minimum working example? That said, it is definitely possible to do a (hierarchical) logistic regression in Turing. Here’s a basic example showing one way to do it:

using Turing
using Distributions

# define logistic (invlogit) function
logistic(x) = 1 / (1 + exp(-x))
# make up some fake parameters and data
npop = 2
ntime = 10
alpha_hyper = Normal(1, 2)
beta_hyper = Normal(0, 0.4)
alpha = rand(alpha_hyper, 1, 2)
beta = rand(beta_hyper, 1, 2)
x = randn(ntime, npop)
p = logistic.(alpha .* x .+ beta)
y = rand.(Bernoulli.(p))

@model function hierarchical_logistic_reg(x, y)
    # priors on hyperparameters
    mu_alpha ~ Normal(0, 5)
    sigma_alpha ~ Gamma(2, 3)
    mu_beta ~ Normal(0, 5)
    sigma_beta ~ Gamma(2, 3)
    # regression parameters
    alpha ~ Normal(mu_alpha, sigma_alpha)
    beta ~ Normal(mu_beta, sigma_beta)
    p = logistic.(alpha .* x .+ beta)
    # observation likelihood
    y ~ arraydist(Bernoulli.(p))
end

model = hierarchical_logistic_reg(x, y)
chn = sample(model, NUTS(), 1000)

using StatsPlots
plot(chn)

If you’re just starting out with Bayesian modeling in Julia, I’d probably recommend using Turing–it’s a more recent and more actively developed library at this point than Mamba.

One other point about doing distributed computations inside the model: double-check that it’s actually giving you a speedup compared to serial computation. For the example I gave above, the overhead of doing the parallel likelihood calculations actually make that approach slower than a simpler serial version.