# LogNormal calculation of standard deviation gets two different answers

I noticed that calculating the standard deviation of a LogNormal distribution directly or from a very large sample yields quite different results. This is not true for the mean.

``````mulog = -12.0
sdlog = 3.0

dist = Distributions.LogNormal(mulog, sdlog)
y = rand(dist, 10000000);

mean(dist) # 0.0005530843701478336
mean(y) # 0.0005622805244076315

std(dist) # 0.04978399616689943
std(y) # 0.028833362676711747
``````

I do not think this is due to rounding errors, since the sample is huge. Am I wrong?

Which packages are used ? I try to replicate the code and get error.

Sorry for the omission. The package `Distributions` is needed in the following line:

``````dist = Distributions.LogNormal(mulog, sdlog)
``````

Maybe it’s the revenge of the fat-tailed distribution:

``````julia> using UnicodePlots

julia> histogram(y)
┌                                        ┐
[ 0.0,  2.0) ┤█████████████████████████████  9 999 878
[ 2.0,  4.0) ┤▏ 93
[ 4.0,  6.0) ┤▏ 11
[ 6.0,  8.0) ┤▏ 6
[ 8.0, 10.0) ┤▏ 4
[10.0, 12.0) ┤▏ 3
[12.0, 14.0) ┤▏ 2
[14.0, 16.0) ┤  0
[16.0, 18.0) ┤  0
[18.0, 20.0) ┤▏ 1
[20.0, 22.0) ┤  0
[22.0, 24.0) ┤  0
[24.0, 26.0) ┤  0
[26.0, 28.0) ┤  0
[28.0, 30.0) ┤▏ 1
[30.0, 32.0) ┤  0
[32.0, 34.0) ┤  0
[34.0, 36.0) ┤  0
[36.0, 38.0) ┤  0
[38.0, 40.0) ┤▏ 1
└                                        ┘
Frequency

``````

See the above very skewed histogram.
Compare with:

``````julia> mulog=1.0
1.0

julia> sdlog=1.0
1.0

julia> dist = Distributions.LogNormal(mulog, sdlog)
LogNormal{Float64}(μ=1.0, σ=1.0)

julia> y = rand(dist, 10000000);

julia> std(y)
5.872315430987349

julia> std(dist)
5.874743663340262
``````

which has a less skewed histogram.

3 Likes

Based on my inspection, the implementation the Log Normal distribution appears to be correct. I believe Dan is correct about the heavy tails introducing uncertainty into the estimates of the mean and standard deviation. One example of a heavy-tailed distribution without a defined variance is the Cauchy distribution.

1 Like

The sample variance (and standard deviation) is a statistic and as such it has a convergence rate. In tame distributions, this rate depends on the 4th moment.
Doing the relevant searches for the exact expressions, I found the following:

``````# variance of sample variance
# m4 = 4th moment
# s4 = standard deviation ^ 4
# n = number of samples
vars2(m4,s4,n) = m4/n - s4*(n-3)/n/(n-1)

# LogNormal 4th moment given mu and sd
lognormal_m4(mulog, sdlog) = exp(4*mulog + (1/2)*16*sdlog^2)
# LogNormal standard deviation ^ 4 given mu and sd
lognormal_s4(mulog, sdlog) = var(Distributions.LogNormal(mulog, sdlog))^2

# Calculation relevant to post
vars2(lognormal_m4(-12.0, 3.0), lognormal_s4(-12.0, 3.0),1000000) #  = 26489.122129843465
``````

The result 26489.1(…) is clearly too large to get an accurate sample standard deviation.
For mu=1.0 sd=1.0, it gives 0.16 with same `n`, enough to get accuracy for the population sd of 5.87(…).

This was a bit of a hasty calculation/google, but the results make sense.