Thanks for the very interesting package . I have been playing around a bit with the mixture model in the documentation. I find that the confidence intervals seem to be too broad, the true parameters are inside more than 90% of the times it is repeated, in my limited experimentation. For example, running the code attached below, which uses 10 repetitions, I get, in three runs
mean(inci, dims = 1) = [1.0 1.0 1.0 0.8 1.0]
mean(inci, dims = 1) = [1.0 1.0 1.0 1.0 0.9]
mean(inci, dims = 1) = [1.0 1.0 1.0 0.9 1.0]
Each time this is run, we compute 10 CIs for each of 5 parameters. So, in the three repetitions, 150 CIs were computed. There were 4 rejections of a true parameter, when we would expect 15. I was wondering if there’s a good way to tune this to try to make the CI coverage more accurate. The code I ran follows. It uses the code for the mixture model, but with a small addition to keep track of the confidence interval results. Thanks again!
#using Pkg
#Pkg.activate("./")
using KissABC
using Distributions
using Plots
function model(P,N)
μ_1, μ_2, σ_1, σ_2, prob=P
d1=randn(N).*σ_1 .+ μ_1
d2=randn(N).*σ_2 .+ μ_2
ps=rand(N).<prob
R=zeros(N)
R[ps].=d1[ps]
R[.!ps].=d2[.!ps]
R
end
function D(x,y)
r=0:0.01:1
sum(abs2,quantile.(Ref(x),r).-quantile.(Ref(y),r))/length(r)
end
function getstats(P,V)
(
param=P,
median=median(V),
lowerbound=quantile(V,0.05),
upperbound=quantile(V,0.95)
)
end
function main(reps)
inci = zeros(reps,5) # holder for true parameters in 90% CI
for r = 1:reps
println("repetition ", r)
parameters = (1.0, 0.0, 0.2, 2.0, 0.4)
data=model(parameters,5000)
#histogram(data)
prior=Factored(
Uniform(0,3), # there is surely a peak between 0 and 3
Uniform(-2,2), #there is a smeared distribution centered around 0
Uniform(0,1), # the peak has surely a width below 1
Uniform(0,4), # the smeared distribution surely has a width less than 4
Beta(4,4) # the number of total events from both distributions look about the same, so we will favor 0.5 just a bit
);
plan=ABCplan(prior,model,data,D,params=5000)
res,Δ=ABCDE(plan,0.02,parallel=true,verbose=false);
stats=getstats.((:μ_1, :μ_2, :σ_1, :σ_2, :prob),[getindex.(res,i) for i in 1:5])
for is in eachindex(stats)
#println(parameters[is], " → ", stats[is])
inci[r, is] = stats[is][3]<parameters[is] && stats[is][4]>parameters[is]
end
end
println("90% CI coverage")
@show mean(inci,dims=1)
return nothing
end