Slower summation of a series in Julia than in Python

Hello,

I was doing some tests in Julia and Python and I found that the following code was much slower in Julia and I do not see why this would be the case

harmonic(k) = (-1)^k / (k + 1)

function sₙ(an, n)
    S = 0.0
    for k in 0:n
        S += an(k)
    end
    return S
end

julia> @time sₙ(harmonic, 500_000)
   0.019135 seconds
0.6931481805570047

In Python I used the code

def harmonic(k):
    return (-1) ** k / (k + 1)

def sum_n(an, n):
    S = 0.0
    for k in range(n+1):
        S += an(k)
    return S

def main():
    start = timeit.timeit()
    sum_n(harmonic, 500000)
    end = timeit.timeit()
    print(abs(end - start))

>>> main()
0.000324424999738713

Any idea on why is this happening?

Which julia version are you running? The way your microbenchmark is structured and the output you’re getting suggests you’re accidentally including compilation time of the julia code in your measurement. Try running it a second time in the same session.

Also, since you write -1 in harmonic (a Int literal) instead of -1.0 (a floating point literal) there may be some conversion cost as well.

5 Likes

Indeed, just changing (-1)^k to (-1.0)^k makes the code faster.

Im running Julia 1.7. The compilation time is not included since I executed @time a second time.

However, I just tried your second suggestion and you are absolutely right. I thought that there was no cost doing (-1)^k over (-1.0)^k but it seems I got that wrong.

Seem to be even faster if you use: harmonic(k) = ifelse(isodd(k) , -1 , 1) / (k + 1)

1 Like

timeit.timeit does not work this way. Each call to timeit.timeit returns the time taken to run the command given to timeit.timeit, which in your case with timeit.timeit() is no command. So your main simply returns the difference between two calls to timeit.timeit, each of which gives the time taken to run no command at all. This probably explains why sometimes the difference is negative and you have used abs to ignore that.

To correctly calculate the time taken to run sum_n(harmonic, 500000), you need to do:

import timeit

timeit.timeit("sum_n(harmonic, 500000)", setup="""
def harmonic(k):
   return (-1) ** k / (k + 1)

def sum_n(an, n):
   S = 0.0
   for k in range(n+1):
       S += an(k)
   return S
""", number=1)

Note however that timeit.timeit by default uses many evaluations (a million) of the expression you supply, so the above would take a very long time to return. Use the number argument to timeit.timeit to get an answer in an acceptable time.

Doing that for me gives about 250 ms for the Python code above. So much slower than your original Julia.

13 Likes

Not a performance tip, but you can simplify sₙ function like this
sₙ(an, n) = sum(an(k) for k in 0:n)

1 Like

I can get the wrong answer even quicker !

julia> @btime sₙ(harmonic, 500_000)
  648.000 μs (0 allocations: 0 bytes)
0.6931481805570047

julia> function sₙ(an, n)
    S = 0.0
    for k in 0:n
        @fastmath S += an(k)
    end
    return S
end

julia> @btime sₙ(harmonic, 500_000)
  444.600 μs (0 allocations: 0 bytes)
0.6931481805569676

Yes, definitely do this instead of (-1.0)^k, or (-1)^k. Repeatedly calculating a power for gradually increasing exponents is needlessly wasteful. It’s nothing to do with conversion cost, just that (-1)^500_000 needs to do approximately log2(500_000) = 19 multiplications, for each iteration. It doesn’t really make any sense to redo all of those calcualations over and over.

The same goes for Python, use some version of isodd, or keep track of the previous value of (-1)^k and get the next by just multiplying by -1.

This is a common performance pitfall. Too often you can find code that implements something like \sum_{k=1}^N x^k/k! dutifully by

function slowsum(x, n)
    val = 0.0
    for i in 1:n
        val += x^i / factorial(i)
    end
    return val
end

instead of

function fastsum(x, n)
    val = 0.0
    part = 1.0
    for i in 1:n
        part *= x / i
        val += part
    end
    return val
end

with results like these:

1.7.2> @btime slowsum(x, 15) setup=(x=2+rand())
  377.451 ns (0 allocations: 0 bytes)
15.747545023566579

1.7.2> @btime fastsum(x, 15) setup=(x=2+rand())
  12.613 ns (0 allocations: 0 bytes)
10.771366140823954

Not to mention the overflow behaviour of slowsum.

4 Likes

The FastPow.jl package will do this transformation for you on (-1)^k if you just put @fastpow in front of your function.

Note that it’s generally even better to compute each term in a series by iteratively updating the previous term, like this:

S = 0.0
term = 1
for k = 0:n
    S += term / (k+1)
    term = -term
end

or you could even unroll:

S = 0.0
for k = 0:2:n
    S += inv(k+1) - inv(k+2)
end

These kinds of tricks become even more important for accumulating series where the terms are more complicated, e.g. `\sum_{k=0}^n z^k / k!:

S = 0.0
term = one(z) / 1
for k = 0:n
    S += term
    term *= z / (k+1)
end
8 Likes

That’s some coincidence, you chose the exact same example as I did, while we were simultaneously editing😁

9 Likes