Apparent mismatch of run times during summation

Hi guys, Julia newbie here. I have a very simple question that I haven’t been able to get my head around.

I ultimately want to sum f(n)*g(n)*x[n-1] where n ranges from 1 to a predetermined integer, where f is some function and x is a pre-determined array. I will, however, illustrate my confusion with a very simple example. Suppose I have two functions I wish to sum, f1 and f2.

f1=x->1/(x^2+1)
f2=x->1/(x^2+im)

Now, suppose I write

using BenchmarkTools
@btime f2(1)

my output is, unsurprisingly,

0.001 ns (0 allocations: 0 bytes)

Same with f1(1). If I now write

@btime f2(1) + f2(2)

I still find

0.001 ns (0 allocations: 0 bytes)

If I time

@btime sum(f2(x) for x in 1:100)

I now find

2.311 μs (0 allocations: 0 bytes)

Arithmetic with complex numbers appears to take 0.001 ns each time I time it, calling the function also takes 0.001ns, so taking that runtime at face value, summing 100 complex numbers should take less than 1 ns. Yet it is taking 1000 times what naive estimates suggest.

In addition, the code

@btime sum(f1(x) for x in 1:100)

outputs

88.168 ns (0 allocations: 0 bytes)

So, there are a few things that I want to ask.

  1. Why is my summation taking significantly more time than the time it takes to calculate each function and sum 100 complex numbers?
  2. Why is there such a large discrepancy between the real and complex functions? Complex arithmetic should not be 20 times slower.
  3. Is something else happening under the hood that I’m not aware of? For instance, I have read that summation incorporates special techniques to reduce cancellation errors.
  4. Is there a way for me to actually sum this function in <= 1 ns, as I expect to be able to do?

Any help would be greatly appreciated!

@btime is lying to you. Whenever you see number less than 1ns it means that the compiler has replaced your calculation with a constant value.

2 Likes

Thanks for that! I did not know that @btime did that.

However, let us assume that the calculation of f2(1) takes only slightly less than 1 ns, and that adding two complex numbers also takes slightly less than 1 ns. 100 of these operations should still take around 200 ns, not 2300 ns?

It’s not that @btime does this, it’s constant propagation in the compiler. When you have simple functions, you can prevent the compiler from constant propegating into the benchmark by timing like so

julia> @btime f1($Ref(1)[])
  29.406 ns (1 allocation: 16 bytes)
0.5

julia> @btime f2($Ref(1)[])
  37.100 ns (1 allocation: 32 bytes)
0.5 - 0.5im
2 Likes

Hm, when checking this I noticed

f1=x->1/(x^2+1)
f2=x->1/(x^2+im)

using BenchmarkTools

println("lambda")
@btime f1(Ref(x)[]) setup=(x=1)
@btime f2(Ref(y)[]) setup=(y=2)
@btime f2(Ref(x)[]) * f2(Ref(y)[]) setup=(x=1; y=2)
@btime sum(f1(x) for x in 1:100) 
@btime sum(f2(x) for x in 1:100)
@btime sum(f1(x) * f2(x) for x in 1:100)

g1(x)=1/(x^2+1)
g2(x)=1/(x^2+im)

using BenchmarkTools

println("function")
@btime g1(Ref(x)[]) setup=(x=1)
@btime g2(Ref(y)[]) setup=(y=2)
@btime g2(Ref(x)[]) * g2(Ref(y)[]) setup=(x=1; y=2)
@btime sum(g1(x) for x in 1:100) 
@btime sum(g2(x) for x in 1:100)
@btime sum(g1(x) * g2(x) for x in 1:100)

yields

lambda
  24.598 ns (1 allocation: 16 bytes)
  40.262 ns (1 allocation: 32 bytes)
  96.832 ns (3 allocations: 96 bytes)
  4.757 μs (199 allocations: 3.11 KiB)
  6.200 μs (199 allocations: 6.22 KiB)
  10.900 μs (399 allocations: 10.91 KiB)
function
  4.300 ns (0 allocations: 0 bytes)
  19.238 ns (0 allocations: 0 bytes)
  36.052 ns (0 allocations: 0 bytes)
  96.733 ns (0 allocations: 0 bytes)
  1.910 μs (0 allocations: 0 bytes)
  2.089 μs (0 allocations: 0 bytes)

Is this expected or am I doing it wrong (again;)?

Edit: ah, I see, const-ing the lambda functions

const f1=x->1/(x^2+1)
const f2=x->1/(x^2+im)

shows the expected performance.

1 Like

Ah, I see. Thanks for letting me know why my benchmarking was not going to plan at all.