I’m looking for a package that implements tools for working with non-parametric multivariate discrete distributions. Ideally, this would look something like a Multivariate
version of DiscreteNonParametric
in Distributions.jl
. I’m sure this wouldn’t be a big lift to implement, but I don’t want to reinvent the wheel if it’s already been done.
If it is just for sampling, Distributions.jl contains utilities to build product distributions. However, I’m not sure they necessarily support parameter fitting
Unfortunately every joint distribution can’t just be represented as a product.
E.g.
using LinearAlgebra
a = normalize(rand(3,4), 1)
a1 = sum(a, dims=1)
a2 = sum(a, dims=2)
anew = a2 * a1
anew ≈ a #false
In this case anew
is a valid joint distribution equal to the product of the marginals of a
, but it is not equal to a
.
What kind of representation do you need? If you don’t mind an “extended” representation (as in, every tuple has its own probability mass and we disregard the connections between said masses), you can always encode all your tuples as integers, even though it’s a hassle.
On the other hand, if you want a “compact” representation, where the dependencies between probability masses are accounted for (typically this would make sense for your example above), you may want to look for libraries to handle probabilistic graphical models.