LogNormal calculation of standard deviation gets two different answers

I noticed that calculating the standard deviation of a LogNormal distribution directly or from a very large sample yields quite different results. This is not true for the mean.

mulog = -12.0
sdlog = 3.0

dist = Distributions.LogNormal(mulog, sdlog)
y = rand(dist, 10000000);

mean(dist) # 0.0005530843701478336
mean(y) # 0.0005622805244076315

std(dist) # 0.04978399616689943
std(y) # 0.028833362676711747

I do not think this is due to rounding errors, since the sample is huge. Am I wrong?

Which packages are used ? I try to replicate the code and get error.

Sorry for the omission. The package Distributions is needed in the following line:

dist = Distributions.LogNormal(mulog, sdlog)

Maybe it’s the revenge of the fat-tailed distribution:

julia> using UnicodePlots

julia> histogram(y)
                β”Œ                                        ┐ 
   [ 0.0,  2.0) β”€β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  9 999 878  
   [ 2.0,  4.0) ─▏ 93                                      
   [ 4.0,  6.0) ─▏ 11                                      
   [ 6.0,  8.0) ─▏ 6                                       
   [ 8.0, 10.0) ─▏ 4                                       
   [10.0, 12.0) ─▏ 3                                       
   [12.0, 14.0) ─▏ 2                                       
   [14.0, 16.0) ─  0                                       
   [16.0, 18.0) ─  0                                       
   [18.0, 20.0) ─▏ 1                                       
   [20.0, 22.0) ─  0                                       
   [22.0, 24.0) ─  0                                       
   [24.0, 26.0) ─  0                                       
   [26.0, 28.0) ─  0                                       
   [28.0, 30.0) ─▏ 1                                       
   [30.0, 32.0) ─  0                                       
   [32.0, 34.0) ─  0                                       
   [34.0, 36.0) ─  0                                       
   [36.0, 38.0) ─  0                                       
   [38.0, 40.0) ─▏ 1                                       
                β””                                        β”˜ 
                                 Frequency                 

See the above very skewed histogram.
Compare with:

julia> mulog=1.0
1.0

julia> sdlog=1.0
1.0

julia> dist = Distributions.LogNormal(mulog, sdlog)
LogNormal{Float64}(ΞΌ=1.0, Οƒ=1.0)

julia> y = rand(dist, 10000000);

julia> std(y)
5.872315430987349

julia> std(dist)
5.874743663340262

which has a less skewed histogram.

3 Likes

Based on my inspection, the implementation the Log Normal distribution appears to be correct. I believe Dan is correct about the heavy tails introducing uncertainty into the estimates of the mean and standard deviation. One example of a heavy-tailed distribution without a defined variance is the Cauchy distribution.

1 Like

The sample variance (and standard deviation) is a statistic and as such it has a convergence rate. In tame distributions, this rate depends on the 4th moment.
Doing the relevant searches for the exact expressions, I found the following:

# variance of sample variance
# m4 = 4th moment
# s4 = standard deviation ^ 4
# n = number of samples
vars2(m4,s4,n) = m4/n - s4*(n-3)/n/(n-1)

# LogNormal 4th moment given mu and sd
lognormal_m4(mulog, sdlog) = exp(4*mulog + (1/2)*16*sdlog^2)
# LogNormal standard deviation ^ 4 given mu and sd
lognormal_s4(mulog, sdlog) = var(Distributions.LogNormal(mulog, sdlog))^2

# Calculation relevant to post
vars2(lognormal_m4(-12.0, 3.0), lognormal_s4(-12.0, 3.0),1000000) #  = 26489.122129843465

The result 26489.1(…) is clearly too large to get an accurate sample standard deviation.
For mu=1.0 sd=1.0, it gives 0.16 with same n, enough to get accuracy for the population sd of 5.87(…).

This was a bit of a hasty calculation/google, but the results make sense.