How to compute density of conditional probability from data

Thanks for the ideas. I will need some time to get into these things, since it’s also not directly my specialisation.

On a first glance, I cannot find functions in the GaussianMixture model, where I can submit two datasets, one which implies the sample probability and another with the data of interest.


On a more basic level, what I don’t understand is why Weight Vectors · StatsBase.jl doesn’t perform well on this test case. (see plot below). I am sure that it’s maybe just this extreme example and there are many advantages of the methods from StatsBase.


using StatsBase, LinearAlgebra
using CairoMakie
m = 1000000
X = vcat(rand(9*m), 1.0 .+ rand(m))
S = rand(0:1,10*m)
# approach 1
h_total = fit(Histogram, X)
h_success = fit(Histogram, X[S .== 1])
h_cond = Histogram(h_total.edges, h_success.weights ./ h_total.weights)

# approach 2
w = pweights(X)[S .== 1]
h_weights = fit(Histogram, X[S .== 1], w, nbins=60)
normalize!(h_cond)

f = begin
    f = Figure()
    p1 = plot(f[1,1],h_cond)
    p1.axis.title[] = "h_succes ./ h_total"

    p2 = plot(f[2,1],h_weights)
    p2.axis.title[] = "h_weights"
    f
end