I have samples X::Vector{Float64}
and S = rand(0:1,1000)
.
I call X[i]
succesful, if S[i] == 1
is satisfied.
I would like to estimate the probablility that a sample is succesful, hence
Now, I don’t know how to do this with data.
I’m sure this is a standard problem, but I don’t know what to search for…
Here is my attempt to far.
In this example, p_x = 0.5 for all points, but the challenge is that there are more points sampled in the interval [0,1] than in [1,2].
using KernelDensity
using CairoMakie
m = 1000000
X = vcat(rand(9*m), 1.0 .+ rand(m))
S = rand(0:1,10*m)
boundary = (-2,4)
kde_total = kde(X; boundary)
kde_success = kde(X[S .== 1]; boundary)
kde_cond = UnivariateKDE(kde_total.x, kde_success.density ./ kde_total.density)
f = let
f = Figure()
p1 = lines(f[1,1],kde_total)
p2 = lines!(f[1,1],kde_success)
p3 = lines(f[2,1],kde_cond, color="red")
inds = -0.0 .<= kde_cond.x .<= 2.0
p4 = lines(f[3,1],kde_cond.x[inds], kde_cond.density[inds], color = "green")
Legend(f[4,1],[p1.plot,p2,p3.plot,p4.plot],["~ P(X=x)","~ P(X=x|S=1)","~ P(S=1|X=x)","~ P(S=1|X=x) on [0,2]"], tellwidth=false, orientation=:horizontal)
f
end
In this test, the analytical result should be that the green line is a straight line. I coudn’t find any parameters of kde
to make this look much better. Are there better statistical methods to compute this? Other Julia packages?