[ANN-RFC] KissABC.jl- Approximate Bayesian Computation

francesco.alemanno · June 3, 2020, 1:25pm

Hi everyone i made a very self-contained package to efficiently perform Approximate Bayesian Computation, this package implements a hybrid MCMC sampler with 3 different affine invariant proposals, solving many of the shortcomings of other MCMC algorithms (adaptive and non adaptive)

this package compared to the alternatives in the julia ecosystem offers:

Ability to sample from mixed support priors (Discrete and Continuous)
Ability to easily implement support for custom types of ABC likelihoods
supports both kernelized ABC and threshold ABC
Very efficient MCMC algorithm with proposals which are insensitive to scale and orientation of the target distribution (Affine Invariant sampling)
A super simple unit tested code, which can be easily extended with more algorithms and features

it does not offer (and will never offer):

surrogate models to speed up ABC
automatic distance functions between datasets

~~this package at the moment is not in Julia Registry, if there is some interest i will register it~~

the package is now registered, add it via

import Pkg
Pkg.add("KissABC")

the repository can be found at

jlapeyre · June 3, 2020, 2:13pm

mohamed82008 · June 3, 2020, 3:30pm

Would be cool to interface this with Turing if possible!

francesco.alemanno · June 3, 2020, 3:31pm

i would love for this to happen: Turing + ABC

robsmith11 · June 3, 2020, 3:48pm

I just gave it a try with a toy exponential mixture model. It works well and produces similar results to ApproxBayes.jl’s ABCSMC, but seems to be bit less sample-efficient.

When using the same \epsilon, KissABC.jl uses about 4x more samples, but produces a posterior with tighter 95% intervals (I’m guessing this is related to ApproxBayes.jl stopping early because of the default tolerance). When I increase the \epsilon for KissABC.jl to get a similar posterior, it still requires 2.5x more samples than ApproxBayes.jl.

Would you expect better results? Is it just a case of not working as well on this specific model?

function sim((u1, p1), params; n=10^6, raw=false)
 u2 = (1.0 - u1*p1)/(1.0 - p1)
 x = randexp(n) .* ifelse.(rand(n) .< p1, u1, u2)
 raw && return x
 [std(x), median(x)]
end

function dist(s, s0)
 sqrt(sum(((s .- s0)./s).^2))
end

ABCSMCPR(Factored(Uniform(0,1), Uniform(0.5,1)), sim, [2.2, 0.4], dist, 0.01, nparticles=100, parall
el=true)

francesco.alemanno · June 3, 2020, 3:54pm

@robsmith11
you can tune parameter c to obtain comparable sampling costs, i had direct experience with the sampler from Toni et al. 2009 implemented in ApproxBayes.jl and it suffers from degeneracy of weights, so the lesser costs in CPU time can turn out to be due to incorrect sampling

in any case
in KissABC.jl the sampling cost is tunable, a higher value of c which defaults to 0.01 will allow the algorithm to spend less time on individual samples, the quality of the posterior will not suffer a lot, but there will be some sample duplication

robsmith11 · June 3, 2020, 4:02pm

Ok, thanks. I really don’t know much about the details of the SMC alogrithms. It usually produces good results for my models, so I’ve been using it. I did have some problems with the posteriors produced using ApproxBayes.jl with some more complex models, so maybe that’s related to what you mentioned. I’ll experiment more later. It’s great to have multiple implementations to compare!

pjentsch0 · June 3, 2020, 4:24pm

Thanks, cool package! Would it be possible to add support to parallelize the particle computation over worker processes (ala Distributed.jl) rather than just threads? It seems like your code uses @threads internally.

francesco.alemanno · June 3, 2020, 4:27pm

it can be done trivially i think , just run multiple SMC processes one for each worker just with fewer particles at the end collect all the results and join them

francesco.alemanno · June 4, 2020, 1:13am

i just implemented a WIP version of ABC SMC - Differential Evolution, it is scary fast, give it a spin

ABCDE is the method name

robsmith11 · June 4, 2020, 4:02am

Nice! It is indeed very fast for me. With the same settings and the mixture model, it’s 10x faster than ABCSMCPR and 3x faster than ApproxBayes. If it works as well for more complex models, it will be quite a nice speed up for some of my work.

You used the algorithm from Turner et al. 2012 for ABCDE? Are there any downsides relative to SMC of which to be careful?

dlakelan · June 4, 2020, 4:32am

instead of a square accept/reject kernel (ie. distance stat within epsilon) can it handle smooth kernels?

francesco.alemanno · June 4, 2020, 5:43am

not that i know of

at the moment no, as smooth kernels tend to complicate the algorithms, i still have to try them.

jbrea · June 4, 2020, 6:36am

I started a similar effort some time ago here:

I didn’t include the Drovandi et al. (2011) method, because I thought for some reason that the similar method by Del Moral et al. (2011) was somewhat superior. Anyway, I usually find kernel-based ABC methods (also included in the package) to be much more sample efficient than MCMC-ABC methods. @francesco.alemanno would you be interested in joining forces to write a really nice and feature-rich ABC package? I like collaborative projects .

francesco.alemanno · June 4, 2020, 9:20am

@jbrea that would be nice! a full fledged powerful ABC package, i would also like to collaborate

pjentsch0 · June 4, 2020, 4:06pm

i just implemented a WIP version of ABC SMC - Differential Evolution, it is scary fast, give it a spin
ABCDE is the method name

Wow, it is really fast. That’s awesome, thank you!

francesco.alemanno · June 7, 2020, 4:58pm

The package is now registered, have fun! and remember to let me know if you find issues or obvious limitations that might have gone unnoticed

mcreel · June 10, 2020, 10:32am

Thanks for the very interesting package . I have been playing around a bit with the mixture model in the documentation. I find that the confidence intervals seem to be too broad, the true parameters are inside more than 90% of the times it is repeated, in my limited experimentation. For example, running the code attached below, which uses 10 repetitions, I get, in three runs

mean(inci, dims = 1) = [1.0 1.0 1.0 0.8 1.0]
mean(inci, dims = 1) = [1.0 1.0 1.0 1.0 0.9]
mean(inci, dims = 1) = [1.0 1.0 1.0 0.9 1.0]

Each time this is run, we compute 10 CIs for each of 5 parameters. So, in the three repetitions, 150 CIs were computed. There were 4 rejections of a true parameter, when we would expect 15. I was wondering if there’s a good way to tune this to try to make the CI coverage more accurate. The code I ran follows. It uses the code for the mixture model, but with a small addition to keep track of the confidence interval results. Thanks again!

#using Pkg
#Pkg.activate("./")
using KissABC
using Distributions
using Plots

function model(P,N)
    μ_1, μ_2, σ_1, σ_2, prob=P
    d1=randn(N).*σ_1 .+ μ_1
    d2=randn(N).*σ_2 .+ μ_2
    ps=rand(N).<prob
    R=zeros(N)
    R[ps].=d1[ps]
    R[.!ps].=d2[.!ps]
    R
end

function D(x,y)
    r=0:0.01:1
    sum(abs2,quantile.(Ref(x),r).-quantile.(Ref(y),r))/length(r)
end

function getstats(P,V)
    (
        param=P,
        median=median(V),
        lowerbound=quantile(V,0.05),
        upperbound=quantile(V,0.95)
    )
end

function main(reps)
inci = zeros(reps,5) # holder for true parameters in 90% CI
for r = 1:reps
    println("repetition ", r)
    parameters = (1.0, 0.0, 0.2, 2.0, 0.4)
    data=model(parameters,5000)
    #histogram(data)
    prior=Factored(
                Uniform(0,3), # there is surely a peak between 0 and 3
                Uniform(-2,2), #there is a smeared distribution centered around 0
                Uniform(0,1), # the peak has surely a width below 1
                Uniform(0,4), # the smeared distribution surely has a width less than 4
                Beta(4,4) # the number of total events from both distributions look about the same, so we will favor 0.5 just a bit
            );
    plan=ABCplan(prior,model,data,D,params=5000)
    res,Δ=ABCDE(plan,0.02,parallel=true,verbose=false);
    stats=getstats.((:μ_1, :μ_2, :σ_1, :σ_2, :prob),[getindex.(res,i) for i in 1:5])

    for is in eachindex(stats)
        #println(parameters[is], " → ", stats[is])
        inci[r, is] = stats[is][3]<parameters[is] && stats[is][4]>parameters[is]
    end
end

println("90% CI coverage")
@show mean(inci,dims=1)
return nothing
end

francesco.alemanno · June 10, 2020, 2:27pm

Thank you for your feedback! glad you find it interesting.
ABC methods most of the time suffer from having wider CI’s than exact methods, since both the imperfect distance metric, and the non-zero tolerance tend to affect CI’s. i think the example can be improved by considering a Kolmogorov-Smirnov distance and a lower tolerance.
by the way the example has been improved a tiny bit in the most recent docs, the most notable difference is the use of abs instead of abs2.

dlakelan · June 10, 2020, 5:04pm

Bayesian methods are not Confidence Procedures. Bayesian probability does not measure frequency of occurrence, it measures plausibility under assumptions.

Topic		Replies	Views
Extracting each iteration result in KissABC.jl New to Julia question , inference , modelling , kissabc	4	155	September 5, 2024
Function rand() or sample() for MCMC sampling and similar? Probabilistic Programming	19	1717	September 23, 2019
Whats going on in kissABC.jl package smc() function Machine Learning question , bayesian-inference , smc , approximate-bayesian , kissabc	5	218	September 6, 2024
ANN: DynamicHMC 2.0 Package Announcements announcement	57	4918	December 20, 2019
MCMC landscape Statistics question	99	13380	July 2, 2021

[ANN-RFC] KissABC.jl- Approximate Bayesian Computation

Related topics