[ANN-RFC] KissABC.jl- Approximate Bayesian Computation

Thanks for the very interesting package :+1:. I have been playing around a bit with the mixture model in the documentation. I find that the confidence intervals seem to be too broad, the true parameters are inside more than 90% of the times it is repeated, in my limited experimentation. For example, running the code attached below, which uses 10 repetitions, I get, in three runs

mean(inci, dims = 1) = [1.0 1.0 1.0 0.8 1.0]
mean(inci, dims = 1) = [1.0 1.0 1.0 1.0 0.9]
mean(inci, dims = 1) = [1.0 1.0 1.0 0.9 1.0]


Each time this is run, we compute 10 CIs for each of 5 parameters. So, in the three repetitions, 150 CIs were computed. There were 4 rejections of a true parameter, when we would expect 15. I was wondering if there’s a good way to tune this to try to make the CI coverage more accurate. The code I ran follows. It uses the code for the mixture model, but with a small addition to keep track of the confidence interval results. Thanks again!

#using Pkg
#Pkg.activate("./")
using KissABC
using Distributions
using Plots

function model(P,N)
    μ_1, μ_2, σ_1, σ_2, prob=P
    d1=randn(N).*σ_1 .+ μ_1
    d2=randn(N).*σ_2 .+ μ_2
    ps=rand(N).<prob
    R=zeros(N)
    R[ps].=d1[ps]
    R[.!ps].=d2[.!ps]
    R
end

function D(x,y)
    r=0:0.01:1
    sum(abs2,quantile.(Ref(x),r).-quantile.(Ref(y),r))/length(r)
end

function getstats(P,V)
    (
        param=P,
        median=median(V),
        lowerbound=quantile(V,0.05),
        upperbound=quantile(V,0.95)
    )
end

function main(reps)
inci = zeros(reps,5) # holder for true parameters in 90% CI
for r = 1:reps
    println("repetition ", r)
    parameters = (1.0, 0.0, 0.2, 2.0, 0.4)
    data=model(parameters,5000)
    #histogram(data)
    prior=Factored(
                Uniform(0,3), # there is surely a peak between 0 and 3
                Uniform(-2,2), #there is a smeared distribution centered around 0
                Uniform(0,1), # the peak has surely a width below 1
                Uniform(0,4), # the smeared distribution surely has a width less than 4
                Beta(4,4) # the number of total events from both distributions look about the same, so we will favor 0.5 just a bit
            );
    plan=ABCplan(prior,model,data,D,params=5000)
    res,Δ=ABCDE(plan,0.02,parallel=true,verbose=false);
    stats=getstats.((:μ_1, :μ_2, :σ_1, :σ_2, :prob),[getindex.(res,i) for i in 1:5])

    for is in eachindex(stats)
        #println(parameters[is], " → ", stats[is])
        inci[r, is] = stats[is][3]<parameters[is] && stats[is][4]>parameters[is]
    end
end

println("90% CI coverage")
@show mean(inci,dims=1)
return nothing
end

2 Likes