Say we have discrete CDF values (percentiles from the field for one metric) such as
percentile 1 => 1.4
…
percentile 10 => 10.3
…
percentile 50 => 50.3
…
percentile 80 => 70.3
…
percentile 100 => 90.3
I am construction a CDF using linear interpolation such as
percentile_values = [
[0.0, 0.01, 0.1,
1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0,
10.0, 20.0, 30.0, 40.0, 50.0, 60.0, 70.0, 80.0,
90.0, 91.0, 92.0, 93.0, 94.0, 95.0, 96.0, 97.0, 98.0, 99.0,
99.9, 99.99, 100.0]
using Interpolations
Xs = [...] # same length as percentile_values
nodes = (percentile_values,)
itp = interpolate(nodes, Xs, Gridded(Linear()))
Questions:
- Can i generate PDF from this interpolated function using differentiation ? If so can you help me with sample code ?
- I have attached standalone sample code that describes my current process ( code uses quantiles [0,1] instead of percentiles [0,100] just to adhere to CDF rules), can you suggest more robust process to convert information from CDF to PDF ?
- Can you provide suggestions an comments on the current method used by me using
Spline1D?
Please refer sample code
using Dierckx
using Interpolations
using ImageFiltering
import Plots
import StatsPlots
using Plots.PlotMeasures
Plots.gr()
Plots.theme(:ggplot2)
# sample input
q_values = [ 0.0, 0.0001, 0.001,
0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09,
0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9,
0.91, 0.92, 0.93, 0.94, 0.95, 0.96, 0.97, 0.98, 0.99,
0.999, 0.9999, 1.0 ]
feature_values = [ 9.0, 12.0, 14.0,
23.0, 28.0, 31.0, 33.0, 36.0, 38.0, 39.0, 41.0, 43.0,
45.0, 58.0, 70.0, 80.0, 87.0, 91.0, 94.0, 95.0, 97.0,
97.0, 97.0, 97.0, 97.0, 97.0, 97.0, 98.0, 98.0, 98.0,
99.0, 99.0, 100.0 ]
# TODO task: Generate PDF from discrete quantiles (also called discrete CDF)
p1 = Plots.scatter(feature_values, q_values, markersize=1.75, color="black", label="33 quantile data points")
Plots.plot!(p1, feature_values, q_values, label="method.0) simple CDF",
title = "input discrete CDF of feature value", color=:green, linewidth=2,
ylabel="Cumulative Density [or Quantile]",
xlim=(-5,105), xlabel="Feature value ∈ [0,100]",
border=nothing
)
# using Spline1D
# interpolate function mapping quantiles -> feature values
spl = Spline1D(q_values, feature_values, k=1, bc="extrapolate")
sample_cdf_q_values1 = [rand() for p in 1:25000]
pdf_of_feature_values1 = [evaluate(spl, p) for p in sample_cdf_q_values1]
p2 = Plots.density(pdf_of_feature_values1,
title = "generated PDF of feature value", color=:green, linewidth=2, label="method.0) using Spline1D",
ylabel="Density",
xlim=(-5,105), xlabel="Feature value ∈ [0,100]",
border=nothing
)
# using methods from Interpolations
# https://discourse.julialang.org/t/interpolations-jl-discrete-cdf-to-pdf/60124/8?u=bicepjai
# smoothing to get a more reasonable-looking PDF
smoothed_feature_values = imfilter(feature_values, ImageFiltering.Kernel.gaussian((1,)));
sample_feature_values = 0.01:0.01:100.0 # just a range for plotting
# you can change SteffenMonotonicInterpolation to some other monotonic algorithm from Interpolations.jl
itp_cdf = extrapolate(interpolate(smoothed_feature_values, q_values, SteffenMonotonicInterpolation()), Flat());
Plots.plot!(p1, sample_feature_values, itp_cdf.(sample_feature_values),
color=:orange, linewidth=1, label="imfilter SteffenMonotonicInterpolation CDF")
itp_pdf(x) = Interpolations.gradient1(itp_cdf, x); # this is the PDF generate
Plots.plot!(p2, sample_feature_values, itp_pdf.(sample_feature_values),
color=:orange, linewidth=1.5, label="method.1) from Interpolations.gradient1 PDF")
# using new methods update the plots with different colors
# so that its easier for comparision
l = @Plots.layout([ a{0.5w} b{0.5w} ])
Plots.plot(p1, p2, layout=l,
legend=:topleft,
top_margin=5mm, bottom_margin=5mm, left_margin=5mm,
dpi=200, size=(1000,500), fmt = :png
)
References:
update: sample code and plots
