Estimating a Multivariate Distribution

Ian_Slagle · May 28, 2020, 8:25pm

Is there a package that can estimate a multivariate distribution? The input to this estimation isn’t really set in stone right now, but preferably it’s something that Optim.jl could work with. I essentially want something that can do something similar to KernelDensity.jl or AverageShiftedHistograms.jl, but with a multivariate distribution. There are only two dimensions to the specific project I’m working on, but they’re dependent, so the bivariate capabilities of the two aforementioned packages don’t apply, I think. Is defining a custom MultivariateDistribution the way to go here?

Tamas_Papp · May 29, 2020, 12:34pm

Any multivariate distribution? Such a problem is not well-specified. Eg the discrete distribution of the sample points could be a valid answer.

You would need add some details. Eg multivariate normal or t distributions are quite easy to estimate using likelihood-based methods, but may not fit your data well (this depends on your data, which we do not know). Mixtures or nonparametric methods would handle this, as well as other, more advanced methods.

dlakelan · May 29, 2020, 1:38pm

it seems you should try the bivariate KDE first. if you’re just trying to make plots or evaluate a density at some points this might be enough. the dependency should be evident if the data are sufficient.

Ian_Slagle · May 29, 2020, 3:05pm

Sorry, I think the word “fit” here was the wrong choice. What I’m looking for here is a way to convert a two-dimensional histogram (or similar data, like a two dimensional matrix) into a continuous Multivariate Distribution or its pdf (so that I can select randomly from said distribution). In such a way, it is “any multivariate distribution” but not in the sense that I’m trying to “fit” to an infinite number of distributions. I believe the two dimensions are dependent, but I’ll try using the Bivariate KDE as @dlakelan suggested here.

dlakelan · May 29, 2020, 3:11pm

In 1D the bandwidth parameter controls the “width” of the kernel. But in 2D not only is there a width in each dimension, but also there’s a covariance structure to the kernel. If you can adapt the covariance structure to the data, your kernel can do a better job. But in general that’s a hard problem, and the covariance can change from place to place. Imagine a distribution that looks like a banana. In one place in x,y space the data stretches out in a vertical direction, as you move around the banana it may stretch out in a diagonal direction, and then later along the banana it may be horizontal… In N dimensions this obviously just gets worse and worse.

But for 2D with sufficient data, you can get a smooth KDE with a bivariate independent kernel that works well enough for many purposes. Give it a try. Better yet, come back and give us a plot of what you got and how well it worked!

Ian_Slagle · May 29, 2020, 9:35pm

I don’t know what I was thinking, but I know understand that there is no distinction between a Bivariate and two-dimensional Multivariate distribution, least of all in dependence. I think I was just confused by the inputs to KernelDensity.jl’s bivariate kde being two vectors. Thank you, and sorry for the misunderstanding

nboyd · May 29, 2020, 10:03pm

If you don’t need the PDF but only need to sample from the distribution you can add gaussian noise to a random point sampled from your data. This is equivalent to sampling from the KDE estimate.

If you only have a histogram you could sample a bin according to its mass, sample a point uniformly from within that bin and then add gaussian noise.

Topic		Replies	Views
[ANN] MultiKDE.jl: A Lazy Evaluation Multivariate Kernel Density Estimator Package Announcements package , announcement , statistics	2	1249	June 30, 2021
Get conditional kernel densities in julia Probabilistic Programming question , distributions	0	605	January 27, 2021
Sample from Kernel Density Estimator Statistics question	3	2043	November 25, 2020
Product of univariate distributions Statistics	13	2447	February 18, 2018
How to write a custom distribution using pdf from a Kernel Density Estimator? Performance statistics , turing , distributions	4	567	September 14, 2021

Estimating a Multivariate Distribution

Related topics