Best way to use/implement Debye function in Julia?

I need the 3rd-order Debye function in a Julia routine and wonder which is the best way to use/implement it.
Searching for it in different repositories, I found various implementations, some of which based on simply solving the integral with Gauss quadrature (FewSpecialFuntions.jl), some of them implementing the approximations given in Abramowitz & Stegun, some providing an interface to the GSL. Which of these approaches is the most efficient and accurate? What are the pros and cons of the different implementations?

1 Like

What are the inputs you’re interested in? If all inputs come from some finite interval on the real line, you could implement the 3rd-order Debye function over that interval using a polynomial. This would be fast and as accurate as necessary.

Otherwise, this seems to work:

using QuadGK

function debye3(x)
  f = t -> (t*t*t)/(expm1(t))
  if iszero(x)
    one(x)
  else
    let i = quadgk(f, false, x, maxevals = Inf)
      3/(x*x*x) * first(i)
    end
  end
end

Yes, the inputs are from a finite interval on the positive part of the real line; the application are energy calculations and related stuff in the context of an equation of state.

What are the endpoints of the interval?

That depends on the temperature, the argument of the Debye function is x=TD/T, where TD is the Debye temperature, in materials of interest for me on the order of a few hundred kelvins. Temperatures can range from ~100 K to the melting temperature of common minerals, i.e., at least around 1500 to 2000 K. The lower end of the interval is 0.
Right now I am trying to key up formulas 27.1.2 and 27.1.3 from A&S to see how fast and accurate it is.

1 Like

Here is a translation of the TOMS 757 Fortran algorithm to Julia. Comparing it to @nsajko’s QuadGK-based algorithm invoked with BigFloats, it seems to be doing whtat it’s supposed to:

"""
    debye3(x::Real)
    
Calculate the Debye function of order 3 in `Float64` precision only.

Translated to Julia from the Fortran version of TOMS 757 by Peter Simon.    

The function is defined by:
   debye3(x) = 3 / x^3 * Integral ( 0 <= t <= x ) t^3 / ( exp ( t ) - 1 ) dt

The code uses Chebyshev series whose coefficients are given to 20 decimal places.

## Original Author:

Allan McLeod,
Department of Mathematics and Statistics,
Paisley University, High Street, Paisley, Scotland, PA12BE
macl_ms0@paisley.ac.uk

##  Reference
Allan McLeod,
Algorithm 757, MISCFUN: A software package to compute uncommon
special functions,
ACM Transactions on Mathematical Software,
Volume 22, Number 3, September 1996, pages 288-301.
"""
function debye3(x::Real)
    adeb3 = (2.70773706832744094526e0, 0.34006813521109175100e0, -0.1294515018444086863e-1,
             0.79637553801738164e-3, -0.5463600095908238e-4, 0.392430195988049e-5, 
            -0.28940328235386e-6, 0.2173176139625e-7, -0.165420999498e-8, 0.12727961892e-9, 
            -0.987963459e-11, 0.77250740e-12, -0.6077972e-13, 0.480759e-14, -0.38204e-15, 
             0.3048e-16, -0.244e-17, 0.20e-18, -0.2e-19)
    debinf = 0.51329911273421675946e-1
    #
    #   Float64-dependent constants
    #
    xlow, xupper = (0.298023e-7, 35.35051e0)
    xlim1, xlim2 = (708.39642, 0.9487163e103)

    x < 0 && throw(ArgumentError("x < 0"))
    
    if x < xlow
        return ((x - 7.5) * x + 20) / 20
    elseif x ≤ 4
        t = ((x*x / 8) - 0.5) - 0.5
        return cheval(adeb3, t) - 0.375 * x
    end
    #
    #   Code for x > 4.0
    #
    if xlim2 < x
        return 0.0
    else
        debye = inv(debinf * x^3)
        if x < xlim1
            expmx = exp(-x)
            if xupper < x
                sum1 = (((x + 3) * x + 6) * x + 6) / x^3
            else
                sum1 = 0.0
                rk = trunc(xlim1 / x)
                nexp = Int(rk)
                xk = rk * x
                for i in nexp:-1:1
                    xki = 1 / xk
                    t =  (((6 * xki + 6) * xki + 3) * xki + 1) / rk
                    sum1 = sum1 * expmx + t
                    rk = rk - 1
                    xk = xk - x
                end
            end
            debye -= 3 * sum1 * expmx
        end
        return debye
    end
end # function
"""
cheval(a::Tuple, t::Real)

Evaluate a Chebyshev series with coefficients stored in `a` at `t`

This function evaluates a Chebyshev series, using the
Clenshaw method with Reinsch modification, as analysed
in the paper by Oliver.  Translated to Julia from the Fortran version
of TOMS 757 by Peter Simon. Calculations are done in `Float64`.

## Original Author
    Allan McLeod,
    Department of Mathematics and Statistics,
    Paisley University, High Street, Paisley, Scotland, PA12BE
    macl_ms0@paisley.ac.uk

## References

    Allan McLeod,
    Algorithm 757, MISCFUN: A software package to compute uncommon
    special functions,
    ACM Transactions on Mathematical Software,
    Volume 22, Number 3, September 1996, pages 288-301.

    J Oliver,
    An error analysis of the modified Clenshaw method for
    evaluating Chebyshev and Fourier series,
    Journal of the IMA,
    Volume 20, 1977, pages 379-391.

"""
function cheval(a::Tuple, t::Real)
    test = 0.6
    u1 = 0.0
    n = length(a)
    if t ≤ -test #  t <= -0.6, Reinsch modification.
        d1 = d2 = 0.0
        tt = 2 * ((t + 0.5) + 0.5)
        @inbounds for i in n:-1:1
            d2 = d1
            u2 = u1
            d1 = tt * u2 + a[i] - d2
            u1 = d1 - u2
        end
        result = (d1 - d2) / 2
        return result
    elseif t < test #  -0.6 < T < 0.6, Standard Clenshaw method.
        u0 = u2 = 0.0
        tt = 2 * t
        @inbounds for i in n:-1:1
            u2 = u1
            u1 = u0
            u0 = tt * u1 + a[i] - u2
        end 
        result = (u0 - u2) / 2
        return result
    else #  0.6 <= T, Reinsch modification.
        d1 = d2 = 0.0
        tt = (t - 0.5) - 0.5
        tt = 2 * tt
        @inbounds for i in n:-1:1
            d2 = d1
            u2 = u1
            d1 = tt * u2 + a[i] + d2
            u1 = d1 + u2
        end
        result = (d1 + d2) / 2
        return result
    end 
end # function
6 Likes

What’s the rationale for using x*x*x instead of x^3? They compile to the same code, as far as I can tell:

julia> xxx(x) = x*x*x
xxx (generic function with 1 method)

julia> x3(x) = x^3
x3 (generic function with 1 method)

julia> @code_llvm xxx(2.0)
[...]
; ┌ @ operators.jl:587 within `*` @ float.jl:411
   %1 = fmul double %0, %0
   %2 = fmul double %1, %0
; â””
  ret double %2
}

julia> @code_llvm x3(2.0)
[...]
; │┌ @ operators.jl:587 within `*` @ float.jl:411
    %1 = fmul double %0, %0
    %2 = fmul double %1, %0
; └└
  ret double %2
}
2 Likes

No good reason—it’s what was in the original Fortran. I will edit the post to use x^3.

You can get more than a 2x speedup (on my machine, but probably similar elsewhere) by inlining the Clenshaw recurrence, since the coefficients are fixed. (See my discussion of this topic in my 2019 JuliaCon talk.) (The degree of the polynomial here is low enough that I doubt the Reinsch modification is worth it, either.)

Here is a macro to generate an inlined evaluation of a Chebyshev polynomial by a Clenshaw recurrence:

macro cheval(t, a...)
    n = length(a)
    u0 = u1 = u2 = 0
    tt = :(2 * t)
    for i in n:-1:1
        u2 = u1
        u1 = u0
        if u1 === 0
            u0 = :($(0.5 * a[i]) - $u2)
        else
            u0 = :(muladd($tt, $u1, $(0.5 * a[i]) - $u2))
        end
    end
    return esc(:($u0 - $u2))
end

Plugging this into your debye3 code is easy. I also made a couple of other trivial modifications (changed divisions into multiplications by reciprocials, used DomainError rather than ArgumentError, and made it clear that the debye3 implementation is only for Float64 precision, with an explicit float call to convert integer and rational types). The following code is almost 2.5x faster on my machine for debye(1.2345):

debye3 with inlined polynomial
debye3(x::Real) = _debye3(float(x))
function _debye3(x::Float64)
    debinf = 0.51329911273421675946e-1
    #
    #   Float64-dependent constants
    #
    xlow, xupper = (0.298023e-7, 35.35051e0)
    xlim1, xlim2 = (708.39642, 0.9487163e103)

    x < 0 && throw(DomainError(x, "a nonnegative argument is required"))
    
    if x < xlow
        return ((x - 7.5) * x + 20) * 0.05
    elseif x ≤ 4
        t = ((x^2 * 0.125) - 0.5) - 0.5
        return @cheval(t, 2.70773706832744094526e0, 0.34006813521109175100e0, -0.1294515018444086863e-1,
                         0.79637553801738164e-3, -0.5463600095908238e-4, 0.392430195988049e-5, 
                        -0.28940328235386e-6, 0.2173176139625e-7, -0.165420999498e-8, 0.12727961892e-9, 
                        -0.987963459e-11, 0.77250740e-12, -0.6077972e-13, 0.480759e-14, -0.38204e-15, 
                         0.3048e-16, -0.244e-17, 0.20e-18, -0.2e-19) - 0.375 * x
    end
    #
    #   Code for x > 4.0
    #
    if xlim2 < x
        return 0.0
    else
        debye = inv(debinf * x^3)
        if x < xlim1
            expmx = exp(-x)
            if xupper < x
                sum1 = (((x + 3) * x + 6) * x + 6) / x^3
            else
                sum1 = 0.0
                rk = trunc(xlim1 / x)
                nexp = Int(rk)
                xk = rk * x
                for i in nexp:-1:1
                    xki = 1 / xk
                    t =  (((6 * xki + 6) * xki + 3) * xki + 1) / rk
                    sum1 = sum1 * expmx + t
                    rk = rk - 1
                    xk = xk - x
                end
            end
            debye -= 3 * sum1 * expmx
        end
        return debye
    end
end

PS. Note, however, that the copyright status of this is problematic — TOMS code is generally illegal to distribute in open-source projects, except in a few cases where the author was able to pressure ACM to let them release it under a free/open-source license — ACM otherwise only allows “noncommercial” use, which is not free/open-source. (I really think that this journal exploited many authors’ naivete about copyright law, and as a result a whole swath of numerical code written by academics is forever locked away from free/open-source software.). Ideally you should read the paper and implement the math from pseudocode without looking at the Fortran code, which is probably okay (copyright law is tricky, though).

9 Likes

Thanks for the great improvements!

Even though the code is freely available on Netlib?

Does it come with a license, and which one, is the question.

1 Like

Freely available ≠ open source. The Netlib TOMS page specifically says

Use of ACM Algorithms is subject to the ACM Software Copyright and License Agreement

which is only “free for noncommercial use” (≠ free/open-source). (The FSF used to call this “semifree” software.)

5 Likes

NB: the accuracy suddenly becomes much worse after 4 (as x increases, there’s a huge jump in relative error when it passes 4), but slowly improves thereafter.

1 Like

Thanks to everybody for the replies. In the meantime, I have written a module that comprises an implementation based mostly on the Abramowitz & Stegun formulas. I have compared it with the TOMS method as given by Steven and with the quadrature-based version from FewSpecialFunctions.jl, and to my disappointment its performance is apparently even more terrible than I anticipated. I’ll share it nonetheless, maybe someone bothers to comment on why it is so bad.

module SomeSpecialFuncs
# even Bernoulli numbers from n=2 to n=40
B2n=(1/6,-1/30,1/42,-1/30,5/66,-691/2730,7/6,-3617/510,43867/798,-174611/330,
     854513/138,-236364091/2730,8553103/6,-23749461029/870,8615841276005/14322,
     -7709321041217/510,2577687858367/6,-26315271553053477373/1919190,
     2929993913841559/6,261082718496449122051/13530)

function bernum(n)
#  based on formula (33) from http://mathworld.wolfram.com/BernoulliNumber.html
    Bn=0
    if iseven(n)
        for k in 0:n
            is=0
            for l in 0:k
                ll=l
                is=is+(-1)^l *(big(ll)^n)*binomial(k,l)
            end
            Bn+=is/(k+1)
        end
    elseif n == 1
        Bn=-0.5
    end
    return Bn
end

function nzeta(n)
# A&S eq. 23.2.16 for even, 23.2.18 for odd (integer only)
    if iseven(n)
        return n == 0 ? -0.5 : 0.5*abs(bernum(n))*((2*pi)^n)/factorial(n)
    else
        if n == 1
            return Inf
        elseif n == 3
            # fast convergence for Apéry's constant
            eps=1e-8
            zeta=0.5
            for k in 2:20
                zo=zeta
                c=factorial(k)^2
                d=factorial(2*big(k))*k^3
                zeta+=(c/d)*(-1)^(k-1)
                abs(zeta-zo)/zo < eps && break
            end
            return 2.5*zeta
        else
            eps=1e-8
            zeta=1
            for k in 2:100
                zo=zeta
                zeta+=float(k)^(-n)
                abs(zeta-zo)/zo < eps && break
            end
        end
        return zeta
    end
end
        
function nDebye(x,n)
    x == 0 && return 1
    eps=1e-8
    if x < pi
        nD=1/n -x/(2*(n+1))
        for k in 1:length(B2n)
            nDo=nD
            k2=2*k
            nD+=(B2n[k]*x^k2)/((k2+n)*factorial(big(k2)))
            abs(nD-nDo) < eps && break
        end
    else
        nD=factorial(n)*nzeta(n+1)
        for k in 1:30
            nDo=nD
            ns=(x^n)/k
            nf=n
            for (i,j) in zip(n-1:-1:0,2:n+1)
                ns+=nf*(x^i)/k^j
                nf*=i
            end
            nD-=ns*exp(-k*x)
            abs(nD-nDo) < eps && break
        end
        nD/=x^n
    end
    nD*n
end

export bernum,nzeta,nDebye
end
1 Like

For the tests, I evaluated the 3rd-order Debye function at x=0.2 and at x=4.2, as I found that the first option in the if block corresponds to smaller x and the second to larger ones. The TOMS-derived function seems to perform vastly differently between the two values: for x=4.2 it took two orders of magnitude longer (unless I did something wrong with @btime). The quadrature-based function performs evenly for both x but is slower than TOMS for x=0.2.

1 Like

This is a slow way to implement a power series. You always want to compute each term as a recurrence from the next. See also And Julia keeps amazing me: high precision computation - #6 by stevengj

2 Likes

Yes, the TOMS function uses a different algorithm for x > 4. However, the algorithm gets increasingly faster as x increases beyond 4. On the other hand, the quadrature-based algorithm gets slower as x increases. Below, debye3 is the TOMS-based routine with @stevengj’s improvements, and debye3int is the quadrature-based routine:

julia> @btime debye3(4.01)
  346.544 ns (0 allocations: 0 bytes)
0.1809354704772911

julia> @btime debye3int(4.01)
  126.185 ns (0 allocations: 0 bytes)
0.18093547047726108

julia> @btime debye3(5.61)
  250.266 ns (0 allocations: 0 bytes)
0.09099171066428026

julia> @btime debye3int(5.61)
  128.781 ns (0 allocations: 0 bytes)
0.09099171066428302

julia> @btime debye3(5.62)
  256.117 ns (0 allocations: 0 bytes)
0.09061624496519508

julia> @btime debye3int(5.62)  # debye3 is faster from here onwards
  420.101 ns (2 allocations: 368 bytes)
0.09061624496519607

julia> @btime debye3(10)
  142.993 ns (0 allocations: 0 bytes)
0.01929576569034549

julia> @btime debye3int(10)
  426.633 ns (2 allocations: 368 bytes)
0.019295765690345492

julia> @btime debye3(20)
  79.442 ns (0 allocations: 0 bytes)
0.002435220067480548

julia> @btime debye3int(20)
  683.784 ns (2 allocations: 368 bytes)
0.002435220067480548

Somewhere between 5.61 and 5.62, the quadrature routine decided it needs more samples of the integrand, increasing the run time. After this point the TOMS-based function is faster by an amount that increases with increasing x.

1 Like

I agree that recurrence relations are a great way to save time, but I don’t see how to make one here. bernum is used only once per call to nDebye to calculate a single Bernoulli number. I only noticed that the function should have been written more compactly as

function bernum(n)
#  based on formula (33) from http://mathworld.wolfram.com/BernoulliNumber.html
    Bn=0
    if iseven(n)
        for k in 0:n
            is=sum([(-1)^l *(big(l)^n)*binomial(k,l) for l in 0:k])
            Bn+=is/(k+1)
        end
    elseif n == 1
        Bn=-0.5
    end
    return Bn
end

but that does virtually nothing to make it faster.

For example, you shouldn’t need to call binomial for each term in the sum, but rather compute each ten from the previous term. (And allocating an array and them calling sum is also terribly inefficient.) But if bernum isn’t performance critical, of course it doesn’t matter.

1 Like

But this loop body is in your performance critical function, right? The same advice applies: compute the factorials and powers iteratively from the previous summand, not separately for each term. By doing so you can also avoid spurious overflow—you definitely want to avoid big

1 Like