In my hobby project I arrived at the following problem: I’m given a family of random variables: Wₖ
for which I need to estimate cdf:
(the above is histogram approximation of pdf from 10^7 samples)
From theoretical considerations I know that
It is clear though that for k=4, 5, 6...
the pdfs are piecewise functions with k=4
consisting of 4
functions: linear in (0.63, 0.73)
, then exponential, another exponential(?), and linear close to w=1
.
Similarly for k=5
there are four pieces, but as k
grows the density becomes smoother (the pieces are visible up to k=7
, or 8
, especially if you plot histograms of sqrt(W)
.
Any hints from anybody who has a grasp of statistics would be much appreciated (I’m not doing statistics at all
Where does it come from?
W is the Shapiro-Wilk W-statistic, the one from SW-normality test, which I’m trying to compute/estimate more… rigorously.
see https://github.com/JuliaStats/HypothesisTests.jl/pull/124
What have I tried
For k >= 9
I found that there is a normalizing power transform:
This seems ok, but as k
grows mse of this approximation (as computed by fitting 0.005:0.005:0.995
quantiles) stabilizes at around 0.004
with noticeably thicker tails and deficit around the mean.
starting from k >= 28
a log transform
does a better job (it is visibly skewed for k<=20
).