Evaluation of the error term in Stirling's formula for factorial

Pull request #33 on the StatsFuns package provides a link to Catherine Loader’s note on evaluating the binomial and Poisson probability mass function. One central part to this is evaluating the error in Stirling’s approximation to the log-factorial function. The version used is odd powers of 1/n with coefficients that are inverses of integers. It starts as

\frac{1}{12n} -\frac{1}{320n^3}+\frac{1}{1260n^5}-\frac{1}{1680n^7}+\dots

Are there clever ways of evaluating such a polynomial so that, e.g., switching to BigInt would be transparent? The code used by Catherine Loader and translated more-or-less literally in the PR memoizes the values for 1 to 15 (where the error is greatest) then switches to Horner evaluation of polynomials with explicitly evaluated Float64 coefficients. I think it would be better to retain the integer representation of the inverse of the coefficients.

A bit more detail on the Stirling approximation. log(factorial(n)) is approximated as n * log(n) -n + log(2π*n)/2 For example

julia> n = big(35)

julia> log(factorial(n))

julia> n * log(n) - n + log(2π*n)/2

julia> n * log(n) - n + log(2π*n)/2 + inv(12n)

julia> n * log(n) - n + log(2π*n)/2 + inv(12n) - inv(320*n^3)

Why not just use the lgamma function for the log of the factorial, since this already supports BigFloat with arbitrary precision?

You have to be careful with Stirling at arbitrary precision since it is an asymptotic series (i.e. the more precision you want, the bigger n needs to be).

As I was writing this I too was beginning to think that lgamma may be a better approach. Loader argues against it for the evaluation of the binomial pdf because of cancellation. I think it would be worthwhile checking out exactly how well it does relative to the simpler formula using lgamma.

This is true. If we define logp(x,n,p) = lgamma(n+1) - lgamma(x+1) - lgamma(n-x+1) + x*log(p) + (n-x)*log1p(-p), then logp(1e6, 2e6, 0.5) gives -7.4801203440874815, whereas the correct (exactly rounded) answer is -7.480120346906837, so in Float64 precision you get 9 correct digits. (She claims that 7 digits are lost in this example, which is maybe too pessimistic by 1 digit.)

However, if you are using BigFloat anyway, then you can just throw more digits at it. This seems a lot easier than working out your own arbitrary-precision routine.

I should also mention that the combination of lgamma calls here is closely related to the lbeta function. Currently, lbeta in Base is computed via the combination of lgamma calls, but it might be nice to have a more sophisticated implementation (especially for Float64) that uses an asymptotic formula for large arguments.