How to compute density of conditional probability from data

I have samples X::Vector{Float64} and S = rand(0:1,1000).
I call X[i] succesful, if S[i] == 1 is satisfied.

I would like to estimate the probablility that a sample is succesful, hence

p_x = \mathbb P ( S = 1 \mid X = x ) = \frac{\mathbb P(S = 1 \cap X = x) }{ \mathbb P(X = x) }.

Now, I don’t know how to do this with data.
I’m sure this is a standard problem, but I don’t know what to search for…

Here is my attempt to far.
In this example, p_x = 0.5 for all points, but the challenge is that there are more points sampled in the interval [0,1] than in [1,2].

using KernelDensity
using CairoMakie
m = 1000000
X = vcat(rand(9*m), 1.0 .+ rand(m))
S = rand(0:1,10*m)

boundary = (-2,4)
kde_total = kde(X; boundary)
kde_success = kde(X[S .== 1]; boundary)

kde_cond = UnivariateKDE(kde_total.x, kde_success.density ./ kde_total.density)

f = let
    f = Figure()
    p1 = lines(f[1,1],kde_total)
    p2 = lines!(f[1,1],kde_success)
    p3 = lines(f[2,1],kde_cond, color="red")
    inds = -0.0 .<= kde_cond.x .<= 2.0
    p4 = lines(f[3,1],kde_cond.x[inds], kde_cond.density[inds], color = "green")
    Legend(f[4,1],[p1.plot,p2,p3.plot,p4.plot],["~ P(X=x)","~ P(X=x|S=1)","~ P(S=1|X=x)","~ P(S=1|X=x) on [0,2]"], tellwidth=false, orientation=:horizontal)
    f
end

In this test, the analytical result should be that the green line is a straight line. I coudn’t find any parameters of kde to make this look much better. Are there better statistical methods to compute this? Other Julia packages?

Ok, I found one approach which gives me a histogram, which is good enough for me…

using StatsBase
h_total = fit(Histogram, X)
h_success = fit(Histogram, X[S .== 1])
h_cond = Histogram(h1.edges, h_success.weights ./ h_total.weights)
plot(h_cond)

This approach doesn’t have the instabilities.

(For people who come across this: There is also Weight Vectors · StatsBase.jl , but I am not sure how to use it for this case.)

Are there better statistical methods to compute this?

This is a long shot for me as it is very far from my area of expertise so please take it with a grin of salt. However, how about the Gaussian Mixture Model which is a parametric probability density function represented as a weighted sum of Gaussian component densities? As far as I understand it, conditional density estimation is similar to kernel density estimation that was presented in your example and might be better suited. I was suggested Gaussian Mixture Models to use for my hobby project, which as far as I understand might have some similarities with the problem described by you. I am not sure if I am able to provide deep details on this topic, so I am providing link to Christopher M. Bishop’s book “Pattern Recognition and Machine Learning” [https://www.microsoft.com/en-us/research/uploads/prod/2006/01/Bishop-Pattern-Recognition-and-Machine-Learning-2006.pdf].

Edit:

Other Julia packages?

This repo is directly about Gaussian Mixtures, however, it is 5/7 years old, not sure if it works with current Julia. Also I guess there might be some other packages probably better suited for this problem. If this is to your interest, I would be curious about your opinion on this topic.

1 Like

I think you might want to do a logistic polynomial* regression (*not actually a polynomial regression, but some spline basis or similar, basically the same idea).

1 Like

Thanks for the ideas. I will need some time to get into these things, since it’s also not directly my specialisation.

On a first glance, I cannot find functions in the GaussianMixture model, where I can submit two datasets, one which implies the sample probability and another with the data of interest.


On a more basic level, what I don’t understand is why Weight Vectors · StatsBase.jl doesn’t perform well on this test case. (see plot below). I am sure that it’s maybe just this extreme example and there are many advantages of the methods from StatsBase.


using StatsBase, LinearAlgebra
using CairoMakie
m = 1000000
X = vcat(rand(9*m), 1.0 .+ rand(m))
S = rand(0:1,10*m)
# approach 1
h_total = fit(Histogram, X)
h_success = fit(Histogram, X[S .== 1])
h_cond = Histogram(h_total.edges, h_success.weights ./ h_total.weights)

# approach 2
w = pweights(X)[S .== 1]
h_weights = fit(Histogram, X[S .== 1], w, nbins=60)
normalize!(h_cond)

f = begin
    f = Figure()
    p1 = plot(f[1,1],h_cond)
    p1.axis.title[] = "h_succes ./ h_total"

    p2 = plot(f[2,1],h_weights)
    p2.axis.title[] = "h_weights"
    f
end

A simple logit or probit model is a possibility, maximizing the sum or average of the log likelihood:

function logit(theta, s, x)
    p = 1.0./(1.0 .+ exp.(-x*theta))
    obj = s.*log.(p) .+ (log.(1.0 .- p)).*(1.0 .- s)
end    

Another possibility is to use nonparametric kernel regression, rather than kernel density. The probability that s=1|X=x is also the conditional mean of s, given that X=x. Running a kernel regression might be a little easier.

1 Like

@SteffenPL

Thanks for the ideas. I will need some time to get into these things, since it’s also not directly my specialisation.

I’m really sorry, I am not able to provide additional information. I got some research background, however, it was a long time ago in the field of competition law.

On a first glance, I cannot find functions in the GaussianMixture model, where I can submit two datasets, one which implies the sample probability and another with the data of interest.

At first, I was reluctant with providing the link to the GaussianMixtures.jl package, however, I decided to edit the post due to the nature of your question. I also added that I guess there might be some other packages probably better suited for your problem.