Differential evolution MCMC in Julia?

Hello,

is there a Julia package that implements the Differential Evolution MCMC sampler?

Is this possible with Turing.jl (I like Turing.jl quite a lot)?

At the moment, I use the BayesianTools R package that implements various samplers that are optimised for the calibration of (ecological) system models. For such models it is not possible to calculate analytically the derivative of the likelihood.

Here is a link to the R package:
https://cran.r-project.org/web/packages/BayesianTools/vignettes/BayesianTools.html#differential-evolution-mcmc

At the moment, I call Julia from R to calculate the likelihood for a given parameter set for my ecological system model and the R package is doing all the sampling (proposing of new paramter values etc…). Here is the script that I use: https://github.com/FelixNoessler/RegionalGrasslandSim.jl/blob/master/lib/RegionalGrasslandValid/scripts/parameter_estimation.R

I was wondering if there is a package that already implements differential evolution mcmc in Julia?

I know that there is the Metaheuristics.jl package that also implements a differential evolution method. However, this package is designed for optimization and not for getting mcmc chains back.

Best
Felix

You can always use any custom MCMC sampling package by converting a Turing.Model to a LogDensityProblem; after converting it, you can pass LogDensityProblems.logdensity to the sampling package.

After a quick search, it does seem that a few differential evolution MCMC packages exist. This one for example. I absolutely have no idea about how reliable/popular these are, so use these at your own risk. Is there any necessary reason to use DE-MCMC algorithms though?

Hello,

thanks for your answer!

I used a differential evolution algorithm because I thought that my model is not compatible with automatic differentiation. But maybe I am wrong with that. It is written purely in Julia, so maybe I should try that? What are conditions for automatic differentiation?

Other options when no gradients are available are random walk metropolis algorithms but then the proposal distribution has to be tuned and I read the differential evolution algorhytm can be quite efficient. Do you have other ideas?

The first requirement is that the model can be differentiated and has a closed-form likelihood function. For example, if your data are normally distributed, you can write x ~ normal(μ, σ), which has a closed-form log likelihood function logpdf(Normal()). The second requirement is that it works with AD. Making the code work with AD typically does not require much effort. Basically, the types have to be general enough that they work with Dual types which is a subtype of Real. We could provide more information if you describe your model or share working code.

1 Like

For non-differentiable but low-dimensional likelihoods, I think slice sampling does wonders. There is no official Turing-compatible implementation of slice sampling at the moment, though.

3 Likes

maybe you are looking for approximate bayesian computation? if your problem is intractable and non-AD, but you can simulate it, these should help:

the first two use differential evolution under the hood, the others maybe too but I didn’t check.

2 Likes

A really good option is Pigeons.jl.which uses reversible parallel tempering. In my experience this works a lot better than differential evolution type samplers, and it does handle multi-modal distributions as well. Plus it has an interface to Turing so you should be able to just use your Turing model.

2 Likes

Thanks for your answer! I can write the likelihood function. It is a combined likelihood of the soil moisture (Beta distribution), biomass (Laplace distribution) and several community weighted traits (Laplace distribution). I checked if my model code is directly AD compatible, but apparently it is not - I got an error with Dual numbers. I think I will first try out the Pigeons.jl package.

Thanks for the hint! Is up to 30 parameter still low dimensional for you?

Thanks for the suggestion! I think I will first use methods that directly use the likelihood instead of summary statistics (or did I understood approximate bayesian computation wrongly?).

This package seems pretty cool!

1 Like

I ran in the meantime a MCMC with a Differential evolution algorithm (with sampling from the past but without snooker updates). I used initial values that were generated with Metaheuristics.jl.

I got an acceptance rate between 5 and 10 % which is not too bad. The chains did not converge at least for some parameters (I ran two independent chains with three internal chains and a thinning rate of 10, red line on the right side is the prior). I have to check if reparametrisation makes sense:


I wanted to try the Pigeons.jl package. But I got an error message:

ERROR: SliceSampler supports contrained target, but the sampler should be initialized in the support:

I think the problem is that not for all values that are possible from the priors a likelihood can be calculated and therefore I just return -Inf in these cases. This happens for example if the overall biomass of a grassland is zero, then I cannot evaluate the likelihood of community weighted mean traits.

Is there a way to specify initial parameter values for the sampler?

Here is a reproducible example. I know it is not short and the precompilation takes quite some time.

import Pkg
Pkg.add(url="https://github.com/felixnoessler/GrasslandTraitSim.jl")

using Pigeons, Distributions, DistributionsAD, Turing

import GrasslandTraitSim.Valid as valid
import GrasslandTraitSim as sim

###################
training_plots = ["$(explo)$(lpad(i, 2, "0"))" for i in 1:9 for explo in ["HEG"]]
input_objs = valid.validation_input_plots(;
    plotIDs = training_plots,
    nspecies = 25,
    trait_seed = 123);
valid_data = valid.get_validation_data_plots(;
    plotIDs = training_plots);
preallocated = [sim.preallocate_vectors(; input_obj = input_objs[training_plots[1]])
                for t in 1:Threads.nthreads()];
mp = valid.model_parameters(;)
###################

function turing_ll(x)
    loglik_vec = Array{Float64}(undef, length(training_plots))
    inf_p = (; zip(Symbol.(mp.names), x)...)

    Threads.@threads for p in eachindex(training_plots)
        loglik_vec[p] = valid.loglikelihood_model(sim; input_objs, valid_data,
            calc = preallocated[Threads.threadid()],
            plotID = training_plots[p], inf_p)
    end

    return sum(loglik_vec)
end


@model function calib_model()
    moistureconv_alpha ~ Normal(0.0, 1.0)
    moistureconv_beta ~ Normal(0.0, 1.0)
    sen_α ~ truncated(Normal(0.0, 0.1); lower = 0)
    sen_leaflifespan ~ truncated(Normal(0.0, 0.1); lower = 0)
    sla_tr ~ truncated(Normal(0.02, 0.01); lower = 0)
    sla_tr_exponent ~ truncated(Normal(1.0, 5.0); lower = 0)
    βₚₑₜ ~ truncated(Normal(1.0, 1.0); lower = 0)
    biomass_dens ~ truncated(Normal(1000.0, 1000.0); lower = 0)
    belowground_density_effect ~ truncated(Normal(1.0, 0.5); lower = 0)
    height_strength ~ Uniform(0.0, 1.0)
    leafnitrogen_graz_exp ~ truncated(Normal(1.0, 5.0); lower = 0)
    trampling_factor ~ truncated(Normal(200.0, 100.0); lower = 0)
    grazing_half_factor ~ truncated(Normal(150.0, 50.0); lower = 0)
    mowing_mid_days ~ truncated(Normal(10.0, 10.0); lower = 0)
    totalN_β ~ truncated(Normal(0.1, 0.1); lower = 0)
    CN_β ~ truncated(Normal(0.1, 0.1); lower = 0)
    max_rsa_above_water_reduction ~ Uniform(0.0, 1.0)
    max_sla_water_reduction ~ Uniform(0.0, 1.0)
    max_amc_nut_reduction ~ Uniform(0.0, 1.0)
    max_rsa_above_nut_reduction ~ Uniform(0.0, 1.0)
    b_biomass ~ InverseGamma(5.0, 2000.0)
    b_soilmoisture ~ truncated(Normal(0.0, 15.0); lower = 0)
    b_sla ~ InverseGamma(10.0, 0.1)
    b_lncm ~ InverseGamma(2.0, 5.0)
    b_amc ~ InverseGamma(10.0, 3.0)
    b_height ~ InverseGamma(10.0, 3.0)
    b_rsa_above ~ InverseGamma(20.0, 0.2)

    p = [moistureconv_alpha, moistureconv_beta, sen_α, sen_leaflifespan,
         sla_tr, sla_tr_exponent, βₚₑₜ, biomass_dens, belowground_density_effect,
         height_strength, leafnitrogen_graz_exp, trampling_factor,
         grazing_half_factor, mowing_mid_days, totalN_β, CN_β,
         max_rsa_above_water_reduction, max_sla_water_reduction,
         max_amc_nut_reduction, max_rsa_above_nut_reduction,
         b_biomass, b_soilmoisture, b_sla, b_lncm, b_amc, b_height, b_rsa_above]

    Turing.@addlogprob! turing_ll(p)

    return nothing
end


m = calib_model()

#### just check that the model can return a loglik
inf_p = (; zip(Symbol.(mp.names), mp.best)...)
TuringLogPotential(m)(inf_p)


#### use pigeons to run the model
pt = pigeons(target = TuringLogPotential(m))

No problem! By the way, you might be able to fix the Dual numbers error. One common error is creating an array with a fixed type. For example

function f(θ)
    # v is Float64 by default
    v = zeros(n)
    ...
end

The solution is to make the type of v depend on the parameter vector, which can be Dual when using AD.

function f(θ::Array{T}) where {T <: Real}
    # now v can be Float64, or Dual
    v = zeros(T, n)
    ...
end

There are other ways AD can create an error, but that is a common one.

Regarding the error message for Pigeons, I’m not sure if there is currently a way to specify initial values. It seems like this is a feature that has not been added: `sample` which takes in an initial state · Issue #109 · TuringLang/AbstractMCMC.jl · GitHub

This might be causing the dual type error.

yes, correct (sorry I didn’t read closely what you have available and what not)! if you have an analytical likelihood model that suits your data well, and can be evaluated quickly, you should make use of it.

ABC is super cool once one comes into a situation where analytical likelihoods are difficult to obtain or the likelihood evaluation too costly (as any model can be simulated). The use of summary statistics is typical for ABC (and may introduce approximation errors if not statistically sufficient) but not strictly required; one can also use the complete data, but it depends on the problem whether this is practical or not.

Hi all, I know I mentioned slice sampling a few days ago and said that there is no official implementation. Well… I took matters into my own hands. I have a package implementing slice sampling supporting the AbstractMCMC interface. It is not yet registered, but more or less feature-complete. Feedback would be much appreciated.

5 Likes

Hello,

thanks for all your answers!

I used your tipp about the types and now I can create gradients with Tracker.jl. With ForwardDiff.jl I get NaNs, even with nansafe_mode = true and I got errors with Zygote.jl and ReverseDiff.jl. But not so bad! :slight_smile:

I checked again the documentation of Pigeons.jl and it is possible to specify initial values when I use the “blackbox” function instead of DynamicPPL.jl/Turing.jl. It seems rather slow, but is working!
https://pigeons.run/stable/input-julia/

1 Like

Hey,

this seems cool! Do you know if your package is very different to the Pigeons.jl package? Apparently, they also use a Slice Sampler.

Pigeons uses parallel tempering so it will require much more computation in general. And it also only supports the classic slice sampling algorithms by Neal (2003). SliceSampling.jl also provides a newer algorithm called latent slice sampling, which requires less tuning efforts and probably scales a little better to higher dimensions. But if your problem is challenging: multimodal or degenerate, Pigeons might be the only thing that works

thanks a lot for all the explanations from all participants! I got a good overview, so thanks a lot! :slight_smile:

Maybe you can also announce it i the Package Annourcements Channel? I am happy to give you some feedback in the next time.

Thanks I will do so after ironing out a few things.