I want to calculate the Kullback-Leibler divergence of data I collected in a vector x, which I interpret as samples from an unknown distribution, and the standard normal distribution. The maths behind the KL divergence are straightforward. My naive approach would be to
choose a number of bins
make a histogram of x
discretize the density of the normal distribution according to the bins
calculate the KL divergence of two vectors using for example kldivergence from StatsBase
I wonder how good of an approach that is (conceptually and implementation wise). Is there a Julia package with more refined methods? What about the sensitivity with respect to the number of bins?
@Palli thanks for your quick answer. If I understood correctly, the KL-divergence in Distances.jl is calculating the distance between two vectors. So conceptually it is doing the same as kldivergence from StatsBase.
I did not understand the use of Sequencer.jl package. What problem does it solve and how would I use it for my case?
Discretizing the normal is the right thing to do.
I’m sure you know this but you want to normalize it as a discrete distribution, not as a density (i.e. ignore the bin widths). Also, you can add a very small constant to everything to avoid numerical issues if there are any zero bins.
I’ve heard it said that a good number of bins is the square root of the number of samples.
You can express the KL-divergence in terms of the estimated ratio and that is usually more robust. All you need are samples from the two densities, no need to create bins.
I did not understand the use of Sequencer.jl package.
It’s off-topic but cool. I didn’t read your question too carefully, and thought you were looking for an implementation, and I remembered (used) it there, but then I realized only in a dependency and edit my answer.
Hi, I have a question regarding KL divergence calculation between two bivariate distributions P(x,y) and Q(x,y) where x is discrete but y is continuous. My strategy is to discretize the continuous variable using bins and then calculate their joint probability distributions. Does this sound okay?