Can anyone tell me what is this distribution?

In my hobby project I arrived at the following problem: I’m given a family of random variables: Wₖ for which I need to estimate cdf:W_histograms
(the above is histogram approximation of pdf from 10^7 samples)

From theoretical considerations I know that

cdf(W₃)(w) = \frac{6}{\pi} \left(\sin^{-1}{\sqrt{w}} - \sin^{-1}{\sqrt{\frac{3}{4}}} \right)

It is clear though that for k=4, 5, 6... the pdfs are piecewise functions with k=4 consisting of 4 functions: linear in (0.63, 0.73), then exponential, another exponential(?), and linear close to w=1.
Similarly for k=5 there are four pieces, but as k grows the density becomes smoother (the pieces are visible up to k=7, or 8, especially if you plot histograms of sqrt(W).

Any hints from anybody who has a grasp of statistics would be much appreciated (I’m not doing statistics at all :wink:

Where does it come from?

W is the Shapiro-Wilk W-statistic, the one from SW-normality test, which I’m trying to compute/estimate more… rigorously.
see https://github.com/JuliaStats/HypothesisTests.jl/pull/124

What have I tried

For k >= 9 I found that there is a normalizing power transform:

W \mapsto (1-W_k)^t \sim \mathcal{N}(\mu_k, \sigma_k)

This seems ok, but as k grows mse of this approximation (as computed by fitting 0.005:0.005:0.995 quantiles) stabilizes at around 0.004 with noticeably thicker tails and deficit around the mean.

starting from k >= 28 a log transform

W \mapsto log(1 - W_k) \sim \mathcal{N}(\mu_k, \sigma_k)

does a better job (it is visibly skewed for k<=20).

1 Like