Approximating a Numerically Computed Distribution

I have a very simple, numerically computed posterior (note that I’m not randomly sampling from the posterior but actually calculate it’s (log-)densities over a given interval) that I know converges to a Normal distribution for high sample sizes. For a somewhat simpler case, I actually have a closed-form approximation, but I now want to generalize my solution (and also possible check any other closed-form approximations).

So I was wondering if there is an elegant way to approximate a pdf, e.g. using the Distributions.jl package? My naive idea would be to use ApproxFun.jl but I was kinda hoping for a more straightforward solution. I could also sample from my posterior and then use fit(Normal, ...) but that can’t be the way to do this…

Do you mean that you have computed it over a grid of values? In that case, maybe what you want is to interpolate between grid points?

I’m actually dealing with a dimension case here, so I’m not sure if a “grid” is a fair description, but yeah, that is basically the idea. But I actually want to fit a Gaussian, i.e. estimating the two parameters μ and σ.

take a look at Distributions.jl/stable/fit and
how to fit a normal distribution

But this assumes that I have observations sampled from some distribution, doesn’t it? Actually, I have a numerically computed pdf over a grid / number of evenly spaced points.

Of course, I can just sample from my pdf and then take the data’s moments (mean, variance) or use fit() to get my normal distribution. It just seems a bit unnecessary to go through this… I guess I was justing hoping for a more elegant way to do this.

(I usually would just go with MCMC and then just just fit() on the samples, but for the simple 1-d case, Turing is order of magnitude slower than just computing the posterior over a grid)

I still don’t quite understand what you want. My latest interpretation is that you have computed some pdf values over a (1D) grid, now you want to compute the mean and standard deviation of that pdf. Is that right? Depending on how much of the support you have covered with your grid points, you could interpolate to create a “filled in” pdf and then just compute the mean and standard deviation using QuadGK with the integral formulas of those statistics. This is much more general than fitting a Gaussian.

Alternatively, if you for some reason don’t have enough points for that, you could do a least squares fit to the closest Normal (or some other) distribution. You could minimize the distance between your pdf(x) values and pdf(Normal(m,s),x) where you are optimizing over m and s. There are a couple of loss functions you can use to do this.

1 Like