Pull request #33 on the `StatsFuns`

package provides a link to Catherine Loader’s `note`

on evaluating the binomial and Poisson probability mass function. One central part to this is evaluating the error in Stirling’s approximation to the log-factorial function. The version used is odd powers of `1/n`

with coefficients that are inverses of integers. It starts as

\frac{1}{12n} -\frac{1}{320n^3}+\frac{1}{1260n^5}-\frac{1}{1680n^7}+\dots

Are there clever ways of evaluating such a polynomial so that, e.g., switching to BigInt would be transparent? The code used by Catherine Loader and translated more-or-less literally in the PR memoizes the values for 1 to 15 (where the error is greatest) then switches to Horner evaluation of polynomials with explicitly evaluated Float64 coefficients. I think it would be better to retain the integer representation of the inverse of the coefficients.

A bit more detail on the Stirling approximation. `log(factorial(n))`

is approximated as `n * log(n) -n + log(2π*n)/2`

For example

```
julia> n = big(35)
35
julia> log(factorial(n))
9.213617560368709248333303629689953216439487104597391356978356285232562502050595e+01
julia> n * log(n) - n + log(2π*n)/2
9.213379471607885835185644929557128163242134296609018476816483085946711233412839e+01
julia> n * log(n) - n + log(2π*n)/2 + inv(12n)
9.213617566845981073280883024795223401337372391847113714911721181184806471508052e+01
julia> n * log(n) - n + log(2π*n)/2 + inv(12n) - inv(320*n^3)
9.213617559557351335671553578731083459646410292721749283424840714712503264510946e+01
```

Why not just use the `lgamma`

function for the log of the factorial, since this already supports `BigFloat`

with arbitrary precision?

You have to be careful with Stirling at arbitrary precision since it is an asymptotic series (i.e. the more precision you want, the bigger `n`

needs to be).

As I was writing this I too was beginning to think that `lgamma`

may be a better approach. Loader argues against it for the evaluation of the binomial pdf because of cancellation. I think it would be worthwhile checking out exactly how well it does relative to the simpler formula using `lgamma`

.

This is true. If we define `logp(x,n,p) = lgamma(n+1) - lgamma(x+1) - lgamma(n-x+1) + x*log(p) + (n-x)*log1p(-p)`

, then `logp(1e6, 2e6, 0.5)`

gives `-7.4801203440874815`

, whereas the correct (exactly rounded) answer is `-7.480120346906837`

, so in `Float64`

precision you get 9 correct digits. (She claims that 7 digits are lost in this example, which is maybe too pessimistic by 1 digit.)

However, if you are using `BigFloat`

anyway, then you can just throw more digits at it. This seems a lot easier than working out your own arbitrary-precision routine.

I should also mention that the combination of `lgamma`

calls here is closely related to the `lbeta`

function. Currently, `lbeta`

in Base is computed via the combination of `lgamma`

calls, but it might be nice to have a more sophisticated implementation (especially for `Float64`

) that uses an asymptotic formula for large arguments.