Interpolations.jl Discrete CDF to PDF

Say we have discrete CDF values (percentiles from the field for one metric) such as

percentile 1 => 1.4

percentile 10 => 10.3

percentile 50 => 50.3

percentile 80 => 70.3

percentile 100 => 90.3

I am construction a CDF using linear interpolation such as

percentile_values = [
[0.0, 0.01, 0.1, 
1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 
10.0, 20.0, 30.0, 40.0, 50.0, 60.0, 70.0, 80.0, 
90.0, 91.0, 92.0, 93.0, 94.0, 95.0, 96.0, 97.0, 98.0, 99.0, 
99.9, 99.99, 100.0]

using Interpolations
Xs = [...] # same length as percentile_values

nodes = (percentile_values,)
itp = interpolate(nodes, Xs, Gridded(Linear()))

Questions:

  1. Can i generate PDF from this interpolated function using differentiation ? If so can you help me with sample code ?
  2. I have attached standalone sample code that describes my current process ( code uses quantiles [0,1] instead of percentiles [0,100] just to adhere to CDF rules), can you suggest more robust process to convert information from CDF to PDF ?
  3. Can you provide suggestions an comments on the current method used by me using Spline1D ?

Please refer sample code

using Dierckx
using Interpolations
using ImageFiltering

import Plots
import StatsPlots
using Plots.PlotMeasures
Plots.gr()
Plots.theme(:ggplot2)


# sample input
q_values = [ 0.0, 0.0001, 0.001, 
    0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 
    0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 
    0.91, 0.92, 0.93, 0.94, 0.95, 0.96, 0.97, 0.98, 0.99, 
    0.999, 0.9999, 1.0 ]

feature_values = [ 9.0, 12.0, 14.0, 
    23.0, 28.0, 31.0, 33.0, 36.0, 38.0, 39.0, 41.0, 43.0, 
    45.0, 58.0, 70.0, 80.0, 87.0, 91.0, 94.0, 95.0, 97.0, 
    97.0, 97.0, 97.0, 97.0, 97.0, 97.0, 98.0, 98.0, 98.0, 
    99.0, 99.0, 100.0 ]

# TODO task: Generate PDF from discrete quantiles (also called discrete CDF)
p1 = Plots.scatter(feature_values, q_values, markersize=1.75, color="black", label="33 quantile data points")

Plots.plot!(p1, feature_values, q_values, label="method.0) simple CDF",
    title = "input discrete CDF of feature value", color=:green, linewidth=2,
    ylabel="Cumulative Density [or Quantile]",
    xlim=(-5,105), xlabel="Feature value ∈ [0,100]",
    border=nothing
)


# using Spline1D
# interpolate function mapping quantiles -> feature values
spl = Spline1D(q_values, feature_values, k=1, bc="extrapolate")
sample_cdf_q_values1 = [rand() for p in 1:25000]
pdf_of_feature_values1 = [evaluate(spl, p) for p in sample_cdf_q_values1]

p2 = Plots.density(pdf_of_feature_values1,
    title = "generated PDF of feature value", color=:green, linewidth=2, label="method.0) using Spline1D",
    ylabel="Density",
    xlim=(-5,105), xlabel="Feature value ∈ [0,100]",
    border=nothing
)

# using methods from Interpolations
# https://discourse.julialang.org/t/interpolations-jl-discrete-cdf-to-pdf/60124/8?u=bicepjai
# smoothing to get a more reasonable-looking PDF
smoothed_feature_values = imfilter(feature_values, ImageFiltering.Kernel.gaussian((1,)));
sample_feature_values = 0.01:0.01:100.0 # just a range for plotting

# you can change SteffenMonotonicInterpolation to some other monotonic algorithm from Interpolations.jl
itp_cdf = extrapolate(interpolate(smoothed_feature_values, q_values, SteffenMonotonicInterpolation()), Flat());
Plots.plot!(p1, sample_feature_values, itp_cdf.(sample_feature_values),
    color=:orange, linewidth=1, label="imfilter SteffenMonotonicInterpolation CDF")

itp_pdf(x) = Interpolations.gradient1(itp_cdf, x); # this is the PDF generate
Plots.plot!(p2, sample_feature_values, itp_pdf.(sample_feature_values),
    color=:orange, linewidth=1.5, label="method.1) from Interpolations.gradient1 PDF")

# using new methods update the plots with different colors
# so that its easier for comparision


l = @Plots.layout([ a{0.5w} b{0.5w} ])
Plots.plot(p1, p2, layout=l,
    legend=:topleft,
    top_margin=5mm, bottom_margin=5mm, left_margin=5mm,
    dpi=200, size=(1000,500), fmt = :png
)

References:

  1. probability - Find the PDF from quantiles - Cross Validated
  2. General usage · Interpolations.jl

update: sample code and plots

You would probably want to do some smoothing over the CDF before trying to extract the PDF. Interpolations will generate a curve that goes through every point. What you want to do is fit a smooth curve through the points.

Rather than interpolating, I suggesting doing curve fitting:

2 Likes

You can also try monotonic interpolation, which has some differentiable options: Interpolations.jl/monotonic.jl at master · JuliaMath/Interpolations.jl · GitHub (it’s sadly not in the docs yet). You can then get the PDF using the gradient1 function.

2 Likes

Thanks for the suggestion. What model would one use for CDF ? exponential as shown in examples ?

I tried going thru the code and was completely lost :slight_smile: do you know any alternate packages where i dont have to understand implementation details ?

I do not know how your interpolation works, but the theory is that:

1° If your cdf hasright and left limits respectively 0 and 1
2° If your cdf is strictly increasising and continuous
Then you can simply use a derivative.

The first thing to do would be to plot the interpolation results to see what it looks Like: if it is, e.g, piecewise linear, you should be able to find the piecewise derivative easily: this would be the piecewise constant value of the PDF.

Edit: if it is not piecewise linear, the package documentation says that some interpolation support a gradient function, which would be your PDF : General usage · Interpolations.jl

This is what i was doing in python and ended up converting it to julia

spl = Spline1D(cdf_y, cdf_x, k=1, bc="extrapolate")
sample_cdf_ys = [100*rand() for p in 1:25000]
Plots.histogram([evaluate(spl, p) for p in sample_cdf_ys], color="red", label="spline cdf")

I know this is very approximate distribution from the CDF. I was looking for more accurate models for converting from discrete CDF (percentiles / quantiles ) to PDF

Thanks for the amazing contributions by these packages. Constructive feedback.

  1. Interpolations.jl has great breadth; its interface can be better (docs can be better)
  2. I ended up using Dierckx.jl ; its simple to use and understand on first pass
  3. Good reference for LsqFit.jl can be found here
2 Likes

Here, I’ve made an example for Interpolations.jl:

julia> using Interpolations, Plots, ImageFiltering

julia> percentile_values = [0.0, 0.01, 0.1, 
       1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 
       10.0, 20.0, 30.0, 40.0, 50.0, 60.0, 70.0, 80.0, 
       90.0, 91.0, 92.0, 93.0, 94.0, 95.0, 96.0, 97.0, 98.0, 99.0, 
       99.9, 99.99, 100.0];

julia> y = sort(randn(length(percentile_values))); # just some random data

julia> smoothed_y = imfilter(y, ImageFiltering.Kernel.gaussian((3,))); # smoothing to get a more reasonable-looking PDF

julia> itp_cdf = extrapolate(interpolate(smoothed_y, percentile_values, SteffenMonotonicInterpolation()), Flat()); # you can change SteffenMonotonicInterpolation to some other monotonic algorithm from Interpolations.jl

julia> itp_pdf(t) = Interpolations.gradient1(itp_cdf, t); # this is the PDF generate

julia> t = -3.0:0.01:3.0 # just a range for plotting

julia> plot(t, itp_cdf.(t)) # plotting the CDF

julia> plot(t, itp_pdf.(t)) # plotting the PDF

I’ll try to improve docs for Interpolations.jl soon.

2 Likes

@mateuszbaran I have updated the question with results of my method (simple linear interpolation using Spline1D) in green. I also added results of your suggested method. The generated PDF looks better using my simple method. Do provide comments and suggestions.

Yes, sure, your PDF definitely looks better but it is actually generated using GitHub - JuliaStats/KernelDensity.jl: Kernel density estimators for Julia – which I recommend in this case over my solution :slightly_smiling_face: . Just note that integrating the green curve on the PDF plot won’t exactly give you the green CDF from the left figure.

1 Like

PRs are always welcome at Interpolations.jl.

1 Like