In my hobby project I arrived at the following problem: I’m given a family of random variables: Wₖ for which I need to estimate cdf:
(the above is histogram approximation of pdf from 10^7 samples)
From theoretical considerations I know that
It is clear though that for k=4, 5, 6... the pdfs are piecewise functions with k=4 consisting of 4 functions: linear in (0.63, 0.73), then exponential, another exponential(?), and linear close to w=1.
Similarly for k=5 there are four pieces, but as k grows the density becomes smoother (the pieces are visible up to k=7, or 8, especially if you plot histograms of sqrt(W).
Any hints from anybody who has a grasp of statistics would be much appreciated (I’m not doing statistics at all ![]()
Where does it come from?
W is the Shapiro-Wilk W-statistic, the one from SW-normality test, which I’m trying to compute/estimate more… rigorously.
see https://github.com/JuliaStats/HypothesisTests.jl/pull/124
What have I tried
For k >= 9 I found that there is a normalizing power transform:
This seems ok, but as k grows mse of this approximation (as computed by fitting 0.005:0.005:0.995 quantiles) stabilizes at around 0.004 with noticeably thicker tails and deficit around the mean.
starting from k >= 28 a log transform
does a better job (it is visibly skewed for k<=20).
