I have 5 empirical CDFs coming from a model. I’d like to average them and plot that average against the empirical CDF of the data. The issue I’m facing is that the points of support of those 5 CDFs are not the same, so I’d have to employ some kind of window of data and average points in there. Of course, because I want to end up with an average that can be interpreted as a CDF, I’d need the average to be non-decreasing and between 0 and 1. Is there any package that allows me to do that?
Are you aware of EmpiricalCDFs.jl?
We also have cooked our own empirical CDF type in TableTransforms.jl here
I wasn’t aware of that. I’m thinking that maybe I should just input all the 5 simulations into a single EmpiricalCDF
and see how that goes.
Why not just average the 5 CDFs? The average (which is 1/5 of the sum) is a monotone function to [0,1]. This would also be the mathematically reasonable thing to do.
More specifically, a CDF is F(x) = Prob(X < x), and the average CDF would be Favg(x) = Prob(choose X from X_1, X_2… X_5 with equal prob AND X < x)
Or another interpretation is, Favg is the CDF of a variable X obtained by choosing one of the CDFs uniformly and sampling a value according to it.
Yeah, I thought it was a harder problem than that. I also don’t have to worry about the “sample points” of the distribution.
Do you think this is equivalent to going a rather long way by first getting the pdfs, then operating with their Fourier transforms (convoluting them thereby obtaining the distribution of the average), and then obtaining the cdf of that convolution?
It is theoretically equivalent (given reasonable smoothness assumptions on the initial distributions). But getting to a PDF and convolving would be much harder and can introduce more errors.
If the disributions are initially empirical, then perhaps smoothing may be needed to get closer to underlying generating process.