[ANN-RFC] KissABC.jl- Approximate Bayesian Computation

Yes, I know, but, asymptotically, the quantiles of the posterior will define correct confidence intervals under the frequentist interpretation (see An MCMC approach to classical estimation - ScienceDirect, theorem 3, for example).

The issue of the variance being larger than what one gets with full sample inference, and thus, leading to broader confidence intervals, is different than the confidence intervals having correct coverage.

So, it seems to me that it would be desirable to be able to find a tuning method so that what Iā€™m loosely calling confidence intervals would contain the true parameter values the appropriate percentage of the time, at least when the sample is large enough so that asymptotic results should be a good approximation.

1 Like

I donā€™t have access to that paper unfortunately, but in general itā€™s easy to get Bayesian models that have no frequentist interpretation (though in your case thatā€™s not the issue since youā€™re literally using an RNG). And how far into the asymptotic range you have to go can be variable based on the model.

It looks like youā€™re using 5000 samples in your model. Does it get more like a calibrated confidence interval if you move to 10000 or 100000 ?

At 5000 samples It should be already damn close to MLE prediction, Indeed now, with an improved distance function the intervals are much tighter, and even now It Is not optimal, ABC methods require a lot of tuning to yield exact bayesian posteriors
@mcreel you can improve the results of ABC via regression adjustement, for now i do not Plan on introducing such methods in KissABC since they are not general purpose, but they can be pretty effective as a post processing procedure on the results of and ABC inference algorithm

1 Like

I think the main issue is that the distance measure is not very good for the purpose of arriving to good confidence intervals. The KissABC.jl documentation notes ā€œnow we need a distance function to compare datasets, this is possibly the worst distance we could use, but it will work out anywayā€. The problem, I believe, is that the distance measure is giving equal weight to statistics that probably have very different variances. The tail quantiles very likely have much larger variances than do the central quantiles. With improper weighting of statistics, that Theorem 3 I cited above will not hold. So, I believe that with other choices of distance measure, the issue could be resolved. But, possibly, there may be ways of tuning the algorithm to get around the problem even with the chosen distance measure. Thatā€™s what my question is focused on.

I want to note that the distance measure is only an example, itā€™s not part of the algorithms that the package provides.

3 Likes

That makes sense, yes using all the quantiles from 0:0.01:1 as equally important means you will need to go deep into high sample size before the distance metric itself is asymptotically anythingā€¦ The 1% and 99% quantiles are high variance for example.

In essence fitting an entire distribution is an infinite dimensional problem, and there doesnā€™t exist an asymptotic sample number when the dimensionality is infinite. Of course in this case youā€™ve restricted it to a 100 dimensional finite dimension approximation, and eventually youā€™d get to an asymptotic range, but I imagine for that model the asymptotic range is well above 5000 samples.

Just released KissABC 1.3, now supporting Kernel Differential Evolution, which is way more sample efficient than hard threshold ABC!

For those interested look for the method KABCDE, and as always let me know if you find any issues, iā€™ll do my best to solve them almost in real time :slight_smile:

4 Likes

awesome! I am looking forward to trying this out at some point.

I gave it a try with my same toy mixture model.

  • Compared with version 1.0, I get slightly tighter posteriors for ABCDE with earlystop=true and similar run times.
  • But with the new default setting (earlystop=false), it seems to never converge, despite it reaching ā€œcompletion = 1.0ā€ quickly. Iā€™m not sure if thereā€™s a termination bug or it really takes that much longer to run (I killed it after it was still running 10x longer than with earlystop).
  • KABCDE seems to work okay, but with the same settings it runs 2.5x longer than ABCDE and results in a posteriors that are 5-10x wider.

Hey, thank you for trying It out, the new changes are due to the fact that some hard problems require refinement steps even after reaching the target tolerance, so earlystop defaults to false, the reason the algorithm Is running longer Is that the parameter generations defaults to 500,

I think you should tune It according to your problem, by the way the methods KABCDE uses the epsilon value in a completely different way, so ti obtain comparable results between kernelised and non kernelised methods the epsilon must be chosen accordingly (and you must consider the weights when using the kernelised methods in order to have proper CIā€™s)

The past performance can be easily attained and surpassed with some care, I simply didnt want to use defaults that could lead to wrong results

2 Likes

Thanks! After letting ABCDE run a bit longer (needed about 14x longer without earlystop), I got similar results, so I guess my problem just isnā€™t difficult enough as you said.

As for KABCDE, yes, I forgot to consider the weights! After running 2.5x as long as ABCD with earlystop, I now get CI that are only 1.5x wider. So still not quite as good. Is there anything you would recommend tuning? Or do you only expect it to be more sample efficient than ABCD without earlystop?

@robsmith11

i deeply care about improving performance as much as possible, so i experimented a bit with your problem

function sim((u1, p1), params; n=10^6, raw=false)
 u2 = (1.0 - u1*p1)/(1.0 - p1)
 x = randexp(n) .* ifelse.(rand(n) .< p1, u1, u2)
 raw && return x
 [std(x), median(x)]
end

function dist(s, s0)
 sqrt(sum(((s .- s0)./s).^2))
end
plan=ABCplan(Factored(Uniform(0,1), Uniform(0.5,1)), sim, [2.2, 0.4], dist)

res,del,conv=ABCDE(plan, 0.01, nparticles=100,generations=30,parallel=true)

using Statistics
function getCI(x::Vector{<:Number})
    quantile(x,[0.25,0.5,0.75])
end
function getCI(x::Vector{<:Tuple})
    [getCI(getindex.(x,i)) for i in 1:length(x[1])]
end

results

via getCI(res)
240 generations:
 [0.48958933397111065, 0.4924062224370781, 0.49559446402487584]
 [0.879783065265908, 0.8816472031816496, 0.8835803050367947]
120 generations:
 [0.4893221894893949, 0.49278449533673585, 0.494863093578758]
 [0.8795982153875357, 0.8816146345915951, 0.8829915018185673]
60 generations:
 [0.4887524655164148, 0.49234470862896673, 0.49502567359353133]
 [0.8796953221457162, 0.8814094610516047, 0.8833899764047788]
30 generations:
 [0.48893164848681747, 0.49163740259340305, 0.4944261757524125]
 [0.8795333045815598, 0.8809134893725196, 0.8829819098348083]
early stop:
 [0.49006664933267297, 0.49313860531909304, 0.49497013116625105]
 [0.8804136291097875, 0.8819843728641816, 0.8834306754737902]

i see very good convergence on this problem, even 30 generations lead reasonable CIā€™s, so on ABCDE i do not see a clear cut performance regression, surely without enforced early stopping there is a bit of tuning involved, what i did see instead is that KABCDE is not performing very well on this problem, i will think of something to improve itā€™s performance.

is there anything iā€™ve missed?

Can someone please explain the purpose of the argument param
it does not seems to do anything!

sim((Ī¼,Ļƒ), param) = randn(100) .* Ļƒ .+ Ī¼

param is used to pass any parameters that your sim function might need, but which you donā€™t want to do SMC on.

2 Likes

Thanks. Agreed, by cutting the number of generations ABCDE can be made very fast with good results even without early stopping.

1 Like
tdata=randn(1000).*0.04.+2

sim((Ī¼,Ļƒ), param) = randn(100) .* Ļƒ .+ Ī¼

The next thing which I do not understand is what the TESTDATA tdata has 1000 elements in an array but the sim function only generate an array with only 100 elements in it.

Why 100 elements? Why not 1000 elements to make it a parity with the TESTDATA? Or why not just 10 elements or 1_000_000 elements? Itā€™s all very very mysterious and confusing.

1 Like

That is a mistake It should be the same, i Will fix It when i get a chance

Is there a reason why the number of samples in the simulated and test data should match?

I usually just adjust the number simulated samples to keep the noise of my chosen statistic within a desired range while minimizing the simulation cost. Depending on the amount of test data available and the specifics of the statistic, this may me mean taking more or fewer simulation samples.

1 Like

I found a youtube video on

ABC
Approximate Bayesian Computation

The ABCā€™s of ABC (Approximate Bayesian Computation)

In theory they must be the same, in practice it is very convenient to have them different

1 Like

Iā€™m having trouble getting similar results with KissABC 2.0 compared to v1.3, will there be more documentation coming for the various new parameters?

Does the method AIS() correspond to ABCDE in some way?

Thanks

1 Like