I need the 3rd-order Debye function in a Julia routine and wonder which is the best way to use/implement it.
Searching for it in different repositories, I found various implementations, some of which based on simply solving the integral with Gauss quadrature (FewSpecialFuntions.jl), some of them implementing the approximations given in Abramowitz & Stegun, some providing an interface to the GSL. Which of these approaches is the most efficient and accurate? What are the pros and cons of the different implementations?
What are the inputs you’re interested in? If all inputs come from some finite interval on the real line, you could implement the 3rd-order Debye function over that interval using a polynomial. This would be fast and as accurate as necessary.
Otherwise, this seems to work:
using QuadGK
function debye3(x)
f = t -> (t*t*t)/(expm1(t))
if iszero(x)
one(x)
else
let i = quadgk(f, false, x, maxevals = Inf)
3/(x*x*x) * first(i)
end
end
end
Yes, the inputs are from a finite interval on the positive part of the real line; the application are energy calculations and related stuff in the context of an equation of state.
What are the endpoints of the interval?
That depends on the temperature, the argument of the Debye function is x=TD/T, where TD is the Debye temperature, in materials of interest for me on the order of a few hundred kelvins. Temperatures can range from ~100 K to the melting temperature of common minerals, i.e., at least around 1500 to 2000 K. The lower end of the interval is 0.
Right now I am trying to key up formulas 27.1.2 and 27.1.3 from A&S to see how fast and accurate it is.
Here is a translation of the TOMS 757 Fortran algorithm to Julia. Comparing it to @nsajko’s QuadGK
-based algorithm invoked with BigFloat
s, it seems to be doing whtat it’s supposed to:
"""
debye3(x::Real)
Calculate the Debye function of order 3 in `Float64` precision only.
Translated to Julia from the Fortran version of TOMS 757 by Peter Simon.
The function is defined by:
debye3(x) = 3 / x^3 * Integral ( 0 <= t <= x ) t^3 / ( exp ( t ) - 1 ) dt
The code uses Chebyshev series whose coefficients are given to 20 decimal places.
## Original Author:
Allan McLeod,
Department of Mathematics and Statistics,
Paisley University, High Street, Paisley, Scotland, PA12BE
macl_ms0@paisley.ac.uk
## Reference
Allan McLeod,
Algorithm 757, MISCFUN: A software package to compute uncommon
special functions,
ACM Transactions on Mathematical Software,
Volume 22, Number 3, September 1996, pages 288-301.
"""
function debye3(x::Real)
adeb3 = (2.70773706832744094526e0, 0.34006813521109175100e0, -0.1294515018444086863e-1,
0.79637553801738164e-3, -0.5463600095908238e-4, 0.392430195988049e-5,
-0.28940328235386e-6, 0.2173176139625e-7, -0.165420999498e-8, 0.12727961892e-9,
-0.987963459e-11, 0.77250740e-12, -0.6077972e-13, 0.480759e-14, -0.38204e-15,
0.3048e-16, -0.244e-17, 0.20e-18, -0.2e-19)
debinf = 0.51329911273421675946e-1
#
# Float64-dependent constants
#
xlow, xupper = (0.298023e-7, 35.35051e0)
xlim1, xlim2 = (708.39642, 0.9487163e103)
x < 0 && throw(ArgumentError("x < 0"))
if x < xlow
return ((x - 7.5) * x + 20) / 20
elseif x ≤ 4
t = ((x*x / 8) - 0.5) - 0.5
return cheval(adeb3, t) - 0.375 * x
end
#
# Code for x > 4.0
#
if xlim2 < x
return 0.0
else
debye = inv(debinf * x^3)
if x < xlim1
expmx = exp(-x)
if xupper < x
sum1 = (((x + 3) * x + 6) * x + 6) / x^3
else
sum1 = 0.0
rk = trunc(xlim1 / x)
nexp = Int(rk)
xk = rk * x
for i in nexp:-1:1
xki = 1 / xk
t = (((6 * xki + 6) * xki + 3) * xki + 1) / rk
sum1 = sum1 * expmx + t
rk = rk - 1
xk = xk - x
end
end
debye -= 3 * sum1 * expmx
end
return debye
end
end # function
"""
cheval(a::Tuple, t::Real)
Evaluate a Chebyshev series with coefficients stored in `a` at `t`
This function evaluates a Chebyshev series, using the
Clenshaw method with Reinsch modification, as analysed
in the paper by Oliver. Translated to Julia from the Fortran version
of TOMS 757 by Peter Simon. Calculations are done in `Float64`.
## Original Author
Allan McLeod,
Department of Mathematics and Statistics,
Paisley University, High Street, Paisley, Scotland, PA12BE
macl_ms0@paisley.ac.uk
## References
Allan McLeod,
Algorithm 757, MISCFUN: A software package to compute uncommon
special functions,
ACM Transactions on Mathematical Software,
Volume 22, Number 3, September 1996, pages 288-301.
J Oliver,
An error analysis of the modified Clenshaw method for
evaluating Chebyshev and Fourier series,
Journal of the IMA,
Volume 20, 1977, pages 379-391.
"""
function cheval(a::Tuple, t::Real)
test = 0.6
u1 = 0.0
n = length(a)
if t ≤ -test # t <= -0.6, Reinsch modification.
d1 = d2 = 0.0
tt = 2 * ((t + 0.5) + 0.5)
@inbounds for i in n:-1:1
d2 = d1
u2 = u1
d1 = tt * u2 + a[i] - d2
u1 = d1 - u2
end
result = (d1 - d2) / 2
return result
elseif t < test # -0.6 < T < 0.6, Standard Clenshaw method.
u0 = u2 = 0.0
tt = 2 * t
@inbounds for i in n:-1:1
u2 = u1
u1 = u0
u0 = tt * u1 + a[i] - u2
end
result = (u0 - u2) / 2
return result
else # 0.6 <= T, Reinsch modification.
d1 = d2 = 0.0
tt = (t - 0.5) - 0.5
tt = 2 * tt
@inbounds for i in n:-1:1
d2 = d1
u2 = u1
d1 = tt * u2 + a[i] + d2
u1 = d1 + u2
end
result = (d1 + d2) / 2
return result
end
end # function
What’s the rationale for using x*x*x
instead of x^3
? They compile to the same code, as far as I can tell:
julia> xxx(x) = x*x*x
xxx (generic function with 1 method)
julia> x3(x) = x^3
x3 (generic function with 1 method)
julia> @code_llvm xxx(2.0)
[...]
; ┌ @ operators.jl:587 within `*` @ float.jl:411
%1 = fmul double %0, %0
%2 = fmul double %1, %0
; â””
ret double %2
}
julia> @code_llvm x3(2.0)
[...]
; │┌ @ operators.jl:587 within `*` @ float.jl:411
%1 = fmul double %0, %0
%2 = fmul double %1, %0
; └└
ret double %2
}
No good reason—it’s what was in the original Fortran. I will edit the post to use x^3
.
You can get more than a 2x speedup (on my machine, but probably similar elsewhere) by inlining the Clenshaw recurrence, since the coefficients are fixed. (See my discussion of this topic in my 2019 JuliaCon talk.) (The degree of the polynomial here is low enough that I doubt the Reinsch modification is worth it, either.)
Here is a macro to generate an inlined evaluation of a Chebyshev polynomial by a Clenshaw recurrence:
macro cheval(t, a...)
n = length(a)
u0 = u1 = u2 = 0
tt = :(2 * t)
for i in n:-1:1
u2 = u1
u1 = u0
if u1 === 0
u0 = :($(0.5 * a[i]) - $u2)
else
u0 = :(muladd($tt, $u1, $(0.5 * a[i]) - $u2))
end
end
return esc(:($u0 - $u2))
end
Plugging this into your debye3
code is easy. I also made a couple of other trivial modifications (changed divisions into multiplications by reciprocials, used DomainError
rather than ArgumentError
, and made it clear that the debye3
implementation is only for Float64
precision, with an explicit float
call to convert integer and rational types). The following code is almost 2.5x faster on my machine for debye(1.2345)
:
debye3 with inlined polynomial
debye3(x::Real) = _debye3(float(x))
function _debye3(x::Float64)
debinf = 0.51329911273421675946e-1
#
# Float64-dependent constants
#
xlow, xupper = (0.298023e-7, 35.35051e0)
xlim1, xlim2 = (708.39642, 0.9487163e103)
x < 0 && throw(DomainError(x, "a nonnegative argument is required"))
if x < xlow
return ((x - 7.5) * x + 20) * 0.05
elseif x ≤ 4
t = ((x^2 * 0.125) - 0.5) - 0.5
return @cheval(t, 2.70773706832744094526e0, 0.34006813521109175100e0, -0.1294515018444086863e-1,
0.79637553801738164e-3, -0.5463600095908238e-4, 0.392430195988049e-5,
-0.28940328235386e-6, 0.2173176139625e-7, -0.165420999498e-8, 0.12727961892e-9,
-0.987963459e-11, 0.77250740e-12, -0.6077972e-13, 0.480759e-14, -0.38204e-15,
0.3048e-16, -0.244e-17, 0.20e-18, -0.2e-19) - 0.375 * x
end
#
# Code for x > 4.0
#
if xlim2 < x
return 0.0
else
debye = inv(debinf * x^3)
if x < xlim1
expmx = exp(-x)
if xupper < x
sum1 = (((x + 3) * x + 6) * x + 6) / x^3
else
sum1 = 0.0
rk = trunc(xlim1 / x)
nexp = Int(rk)
xk = rk * x
for i in nexp:-1:1
xki = 1 / xk
t = (((6 * xki + 6) * xki + 3) * xki + 1) / rk
sum1 = sum1 * expmx + t
rk = rk - 1
xk = xk - x
end
end
debye -= 3 * sum1 * expmx
end
return debye
end
end
PS. Note, however, that the copyright status of this is problematic — TOMS code is generally illegal to distribute in open-source projects, except in a few cases where the author was able to pressure ACM to let them release it under a free/open-source license — ACM otherwise only allows “noncommercial” use, which is not free/open-source. (I really think that this journal exploited many authors’ naivete about copyright law, and as a result a whole swath of numerical code written by academics is forever locked away from free/open-source software.). Ideally you should read the paper and implement the math from pseudocode without looking at the Fortran code, which is probably okay (copyright law is tricky, though).
Thanks for the great improvements!
Even though the code is freely available on Netlib?
Does it come with a license, and which one, is the question.
Freely available ≠open source. The Netlib TOMS page specifically says
Use of ACM Algorithms is subject to the ACM Software Copyright and License Agreement
which is only “free for noncommercial use” (≠free/open-source). (The FSF used to call this “semifree” software.)
NB: the accuracy suddenly becomes much worse after 4 (as x increases, there’s a huge jump in relative error when it passes 4), but slowly improves thereafter.
Thanks to everybody for the replies. In the meantime, I have written a module that comprises an implementation based mostly on the Abramowitz & Stegun formulas. I have compared it with the TOMS method as given by Steven and with the quadrature-based version from FewSpecialFunctions.jl, and to my disappointment its performance is apparently even more terrible than I anticipated. I’ll share it nonetheless, maybe someone bothers to comment on why it is so bad.
module SomeSpecialFuncs
# even Bernoulli numbers from n=2 to n=40
B2n=(1/6,-1/30,1/42,-1/30,5/66,-691/2730,7/6,-3617/510,43867/798,-174611/330,
854513/138,-236364091/2730,8553103/6,-23749461029/870,8615841276005/14322,
-7709321041217/510,2577687858367/6,-26315271553053477373/1919190,
2929993913841559/6,261082718496449122051/13530)
function bernum(n)
# based on formula (33) from http://mathworld.wolfram.com/BernoulliNumber.html
Bn=0
if iseven(n)
for k in 0:n
is=0
for l in 0:k
ll=l
is=is+(-1)^l *(big(ll)^n)*binomial(k,l)
end
Bn+=is/(k+1)
end
elseif n == 1
Bn=-0.5
end
return Bn
end
function nzeta(n)
# A&S eq. 23.2.16 for even, 23.2.18 for odd (integer only)
if iseven(n)
return n == 0 ? -0.5 : 0.5*abs(bernum(n))*((2*pi)^n)/factorial(n)
else
if n == 1
return Inf
elseif n == 3
# fast convergence for Apéry's constant
eps=1e-8
zeta=0.5
for k in 2:20
zo=zeta
c=factorial(k)^2
d=factorial(2*big(k))*k^3
zeta+=(c/d)*(-1)^(k-1)
abs(zeta-zo)/zo < eps && break
end
return 2.5*zeta
else
eps=1e-8
zeta=1
for k in 2:100
zo=zeta
zeta+=float(k)^(-n)
abs(zeta-zo)/zo < eps && break
end
end
return zeta
end
end
function nDebye(x,n)
x == 0 && return 1
eps=1e-8
if x < pi
nD=1/n -x/(2*(n+1))
for k in 1:length(B2n)
nDo=nD
k2=2*k
nD+=(B2n[k]*x^k2)/((k2+n)*factorial(big(k2)))
abs(nD-nDo) < eps && break
end
else
nD=factorial(n)*nzeta(n+1)
for k in 1:30
nDo=nD
ns=(x^n)/k
nf=n
for (i,j) in zip(n-1:-1:0,2:n+1)
ns+=nf*(x^i)/k^j
nf*=i
end
nD-=ns*exp(-k*x)
abs(nD-nDo) < eps && break
end
nD/=x^n
end
nD*n
end
export bernum,nzeta,nDebye
end
For the tests, I evaluated the 3rd-order Debye function at x=0.2 and at x=4.2, as I found that the first option in the if block corresponds to smaller x and the second to larger ones. The TOMS-derived function seems to perform vastly differently between the two values: for x=4.2 it took two orders of magnitude longer (unless I did something wrong with @btime
). The quadrature-based function performs evenly for both x but is slower than TOMS for x=0.2.
This is a slow way to implement a power series. You always want to compute each term as a recurrence from the next. See also And Julia keeps amazing me: high precision computation - #6 by stevengj
Yes, the TOMS function uses a different algorithm for x > 4. However, the algorithm gets increasingly faster as x increases beyond 4. On the other hand, the quadrature-based algorithm gets slower as x increases. Below, debye3
is the TOMS-based routine with @stevengj’s improvements, and debye3int
is the quadrature-based routine:
julia> @btime debye3(4.01)
346.544 ns (0 allocations: 0 bytes)
0.1809354704772911
julia> @btime debye3int(4.01)
126.185 ns (0 allocations: 0 bytes)
0.18093547047726108
julia> @btime debye3(5.61)
250.266 ns (0 allocations: 0 bytes)
0.09099171066428026
julia> @btime debye3int(5.61)
128.781 ns (0 allocations: 0 bytes)
0.09099171066428302
julia> @btime debye3(5.62)
256.117 ns (0 allocations: 0 bytes)
0.09061624496519508
julia> @btime debye3int(5.62) # debye3 is faster from here onwards
420.101 ns (2 allocations: 368 bytes)
0.09061624496519607
julia> @btime debye3(10)
142.993 ns (0 allocations: 0 bytes)
0.01929576569034549
julia> @btime debye3int(10)
426.633 ns (2 allocations: 368 bytes)
0.019295765690345492
julia> @btime debye3(20)
79.442 ns (0 allocations: 0 bytes)
0.002435220067480548
julia> @btime debye3int(20)
683.784 ns (2 allocations: 368 bytes)
0.002435220067480548
Somewhere between 5.61 and 5.62, the quadrature routine decided it needs more samples of the integrand, increasing the run time. After this point the TOMS-based function is faster by an amount that increases with increasing x.
I agree that recurrence relations are a great way to save time, but I don’t see how to make one here. bernum
is used only once per call to nDebye
to calculate a single Bernoulli number. I only noticed that the function should have been written more compactly as
function bernum(n)
# based on formula (33) from http://mathworld.wolfram.com/BernoulliNumber.html
Bn=0
if iseven(n)
for k in 0:n
is=sum([(-1)^l *(big(l)^n)*binomial(k,l) for l in 0:k])
Bn+=is/(k+1)
end
elseif n == 1
Bn=-0.5
end
return Bn
end
but that does virtually nothing to make it faster.
For example, you shouldn’t need to call binomial
for each term in the sum, but rather compute each ten from the previous term. (And allocating an array and them calling sum
is also terribly inefficient.) But if bernum
isn’t performance critical, of course it doesn’t matter.
But this loop body is in your performance critical function, right? The same advice applies: compute the factorials and powers iteratively from the previous summand, not separately for each term. By doing so you can also avoid spurious overflow—you definitely want to avoid big