In my hobby project I arrived at the following problem: I’m given a family of random variables:
Wₖ for which I need to estimate cdf:
(the above is histogram approximation of pdf from 10^7 samples)
From theoretical considerations I know that
It is clear though that for
k=4, 5, 6... the pdfs are piecewise functions with
k=4 consisting of
4 functions: linear in
(0.63, 0.73), then exponential, another exponential(?), and linear close to
k=5 there are four pieces, but as
k grows the density becomes smoother (the pieces are visible up to
8, especially if you plot histograms of
Any hints from anybody who has a grasp of statistics would be much appreciated (I’m not doing statistics at all
Where does it come from?
W is the Shapiro-Wilk W-statistic, the one from SW-normality test, which I’m trying to compute/estimate more… rigorously.
What have I tried
k >= 9 I found that there is a normalizing power transform:
This seems ok, but as
k grows mse of this approximation (as computed by fitting
0.005:0.005:0.995 quantiles) stabilizes at around
0.004 with noticeably thicker tails and deficit around the mean.
k >= 28 a log transform
does a better job (it is visibly skewed for