Interpolations.jl Discrete CDF to PDF

bicepjai · April 27, 2021, 7:45pm

Say we have discrete CDF values (percentiles from the field for one metric) such as

percentile 1 => 1.4
…
percentile 10 => 10.3
…
percentile 50 => 50.3
…
percentile 80 => 70.3
…
percentile 100 => 90.3

I am construction a CDF using linear interpolation such as

percentile_values = [
[0.0, 0.01, 0.1, 
1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 
10.0, 20.0, 30.0, 40.0, 50.0, 60.0, 70.0, 80.0, 
90.0, 91.0, 92.0, 93.0, 94.0, 95.0, 96.0, 97.0, 98.0, 99.0, 
99.9, 99.99, 100.0]

using Interpolations
Xs = [...] # same length as percentile_values

nodes = (percentile_values,)
itp = interpolate(nodes, Xs, Gridded(Linear()))

Questions:

Can i generate PDF from this interpolated function using differentiation ? If so can you help me with sample code ?
I have attached standalone sample code that describes my current process ( code uses quantiles [0,1] instead of percentiles [0,100] just to adhere to CDF rules), can you suggest more robust process to convert information from CDF to PDF ?
Can you provide suggestions an comments on the current method used by me using Spline1D ?

Please refer sample code

using Dierckx
using Interpolations
using ImageFiltering

import Plots
import StatsPlots
using Plots.PlotMeasures
Plots.gr()
Plots.theme(:ggplot2)


# sample input
q_values = [ 0.0, 0.0001, 0.001, 
    0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 
    0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 
    0.91, 0.92, 0.93, 0.94, 0.95, 0.96, 0.97, 0.98, 0.99, 
    0.999, 0.9999, 1.0 ]

feature_values = [ 9.0, 12.0, 14.0, 
    23.0, 28.0, 31.0, 33.0, 36.0, 38.0, 39.0, 41.0, 43.0, 
    45.0, 58.0, 70.0, 80.0, 87.0, 91.0, 94.0, 95.0, 97.0, 
    97.0, 97.0, 97.0, 97.0, 97.0, 97.0, 98.0, 98.0, 98.0, 
    99.0, 99.0, 100.0 ]

# TODO task: Generate PDF from discrete quantiles (also called discrete CDF)
p1 = Plots.scatter(feature_values, q_values, markersize=1.75, color="black", label="33 quantile data points")

Plots.plot!(p1, feature_values, q_values, label="method.0) simple CDF",
    title = "input discrete CDF of feature value", color=:green, linewidth=2,
    ylabel="Cumulative Density [or Quantile]",
    xlim=(-5,105), xlabel="Feature value ∈ [0,100]",
    border=nothing
)


# using Spline1D
# interpolate function mapping quantiles -> feature values
spl = Spline1D(q_values, feature_values, k=1, bc="extrapolate")
sample_cdf_q_values1 = [rand() for p in 1:25000]
pdf_of_feature_values1 = [evaluate(spl, p) for p in sample_cdf_q_values1]

p2 = Plots.density(pdf_of_feature_values1,
    title = "generated PDF of feature value", color=:green, linewidth=2, label="method.0) using Spline1D",
    ylabel="Density",
    xlim=(-5,105), xlabel="Feature value ∈ [0,100]",
    border=nothing
)

# using methods from Interpolations
# https://discourse.julialang.org/t/interpolations-jl-discrete-cdf-to-pdf/60124/8?u=bicepjai
# smoothing to get a more reasonable-looking PDF
smoothed_feature_values = imfilter(feature_values, ImageFiltering.Kernel.gaussian((1,)));
sample_feature_values = 0.01:0.01:100.0 # just a range for plotting

# you can change SteffenMonotonicInterpolation to some other monotonic algorithm from Interpolations.jl
itp_cdf = extrapolate(interpolate(smoothed_feature_values, q_values, SteffenMonotonicInterpolation()), Flat());
Plots.plot!(p1, sample_feature_values, itp_cdf.(sample_feature_values),
    color=:orange, linewidth=1, label="imfilter SteffenMonotonicInterpolation CDF")

itp_pdf(x) = Interpolations.gradient1(itp_cdf, x); # this is the PDF generate
Plots.plot!(p2, sample_feature_values, itp_pdf.(sample_feature_values),
    color=:orange, linewidth=1.5, label="method.1) from Interpolations.gradient1 PDF")

# using new methods update the plots with different colors
# so that its easier for comparision


l = @Plots.layout([ a{0.5w} b{0.5w} ])
Plots.plot(p1, p2, layout=l,
    legend=:topleft,
    top_margin=5mm, bottom_margin=5mm, left_margin=5mm,
    dpi=200, size=(1000,500), fmt = :png
)

References:

update: sample code and plots

mkitti · April 27, 2021, 8:01pm

You would probably want to do some smoothing over the CDF before trying to extract the PDF. Interpolations will generate a curve that goes through every point. What you want to do is fit a smooth curve through the points.

Rather than interpolating, I suggesting doing curve fitting:

mateuszbaran · April 27, 2021, 9:17pm

You can also try monotonic interpolation, which has some differentiable options: Interpolations.jl/monotonic.jl at master · JuliaMath/Interpolations.jl · GitHub (it’s sadly not in the docs yet). You can then get the PDF using the gradient1 function.

bicepjai · April 27, 2021, 9:32pm

Thanks for the suggestion. What model would one use for CDF ? exponential as shown in examples ?

bicepjai · April 27, 2021, 9:39pm

I tried going thru the code and was completely lost do you know any alternate packages where i dont have to understand implementation details ?

lrnv · April 27, 2021, 9:56pm

I do not know how your interpolation works, but the theory is that:

1° If your cdf hasright and left limits respectively 0 and 1
2° If your cdf is strictly increasising and continuous
Then you can simply use a derivative.

The first thing to do would be to plot the interpolation results to see what it looks Like: if it is, e.g, piecewise linear, you should be able to find the piecewise derivative easily: this would be the piecewise constant value of the PDF.

Edit: if it is not piecewise linear, the package documentation says that some interpolation support a gradient function, which would be your PDF : General usage · Interpolations.jl

bicepjai · April 28, 2021, 12:05am

This is what i was doing in python and ended up converting it to julia

spl = Spline1D(cdf_y, cdf_x, k=1, bc="extrapolate")
sample_cdf_ys = [100*rand() for p in 1:25000]
Plots.histogram([evaluate(spl, p) for p in sample_cdf_ys], color="red", label="spline cdf")

I know this is very approximate distribution from the CDF. I was looking for more accurate models for converting from discrete CDF (percentiles / quantiles ) to PDF

Thanks for the amazing contributions by these packages. Constructive feedback.

Interpolations.jl has great breadth; its interface can be better (docs can be better)
I ended up using Dierckx.jl ; its simple to use and understand on first pass
Good reference for LsqFit.jl can be found here

mateuszbaran · April 28, 2021, 9:27am

Here, I’ve made an example for Interpolations.jl:

julia> using Interpolations, Plots, ImageFiltering

julia> percentile_values = [0.0, 0.01, 0.1, 
       1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 
       10.0, 20.0, 30.0, 40.0, 50.0, 60.0, 70.0, 80.0, 
       90.0, 91.0, 92.0, 93.0, 94.0, 95.0, 96.0, 97.0, 98.0, 99.0, 
       99.9, 99.99, 100.0];

julia> y = sort(randn(length(percentile_values))); # just some random data

julia> smoothed_y = imfilter(y, ImageFiltering.Kernel.gaussian((3,))); # smoothing to get a more reasonable-looking PDF

julia> itp_cdf = extrapolate(interpolate(smoothed_y, percentile_values, SteffenMonotonicInterpolation()), Flat()); # you can change SteffenMonotonicInterpolation to some other monotonic algorithm from Interpolations.jl

julia> itp_pdf(t) = Interpolations.gradient1(itp_cdf, t); # this is the PDF generate

julia> t = -3.0:0.01:3.0 # just a range for plotting

julia> plot(t, itp_cdf.(t)) # plotting the CDF

julia> plot(t, itp_pdf.(t)) # plotting the PDF

I’ll try to improve docs for Interpolations.jl soon.

bicepjai · May 2, 2021, 1:50am

@mateuszbaran I have updated the question with results of my method (simple linear interpolation using Spline1D) in green. I also added results of your suggested method. The generated PDF looks better using my simple method. Do provide comments and suggestions.

mateuszbaran · May 3, 2021, 2:11pm

Yes, sure, your PDF definitely looks better but it is actually generated using https://github.com/JuliaStats/KernelDensity.jl – which I recommend in this case over my solution . Just note that integrating the green curve on the PDF plot won’t exactly give you the green CDF from the left figure.

mkitti · May 4, 2021, 5:51pm

PRs are always welcome at Interpolations.jl.

Topic		Replies	Views
Empirical distribution type for continuous variables Statistics question , proposal	18	4795	February 14, 2023
CDF of a copula from Copulas.jl? Statistics question , package	24	896	August 19, 2022
[ANN] NumericalDistributions.jl: user-defined distributions Statistics package , announcement	1	200	April 13, 2025
Updating documentation for Interpolations.jl v0.14 Community package , documentation , interpolations	13	1243	June 15, 2023
Opposite of StatsBase.percentile Statistics	5	380	November 21, 2022

Interpolations.jl Discrete CDF to PDF

Related topics