Apparent mismatch of run times during summation

lzxnl · January 31, 2022, 6:31pm

Hi guys, Julia newbie here. I have a very simple question that I haven’t been able to get my head around.

I ultimately want to sum f(n)*g(n)*x[n-1] where n ranges from 1 to a predetermined integer, where f is some function and x is a pre-determined array. I will, however, illustrate my confusion with a very simple example. Suppose I have two functions I wish to sum, f1 and f2.

f1=x->1/(x^2+1)
f2=x->1/(x^2+im)

Now, suppose I write

using BenchmarkTools
@btime f2(1)

my output is, unsurprisingly,

0.001 ns (0 allocations: 0 bytes)

Same with f1(1). If I now write

@btime f2(1) + f2(2)

I still find

0.001 ns (0 allocations: 0 bytes)

If I time

@btime sum(f2(x) for x in 1:100)

I now find

2.311 μs (0 allocations: 0 bytes)

Arithmetic with complex numbers appears to take 0.001 ns each time I time it, calling the function also takes 0.001ns, so taking that runtime at face value, summing 100 complex numbers should take less than 1 ns. Yet it is taking 1000 times what naive estimates suggest.

In addition, the code

@btime sum(f1(x) for x in 1:100)

outputs

88.168 ns (0 allocations: 0 bytes)

So, there are a few things that I want to ask.

Why is my summation taking significantly more time than the time it takes to calculate each function and sum 100 complex numbers?
Why is there such a large discrepancy between the real and complex functions? Complex arithmetic should not be 20 times slower.
Is something else happening under the hood that I’m not aware of? For instance, I have read that summation incorporates special techniques to reduce cancellation errors.
Is there a way for me to actually sum this function in <= 1 ns, as I expect to be able to do?

Any help would be greatly appreciated!

Oscar_Smith · January 31, 2022, 6:47pm

@btime is lying to you. Whenever you see number less than 1ns it means that the compiler has replaced your calculation with a constant value.

lzxnl · January 31, 2022, 6:50pm

Thanks for that! I did not know that @btime did that.

However, let us assume that the calculation of f2(1) takes only slightly less than 1 ns, and that adding two complex numbers also takes slightly less than 1 ns. 100 of these operations should still take around 200 ns, not 2300 ns?

Oscar_Smith · January 31, 2022, 6:55pm

It’s not that @btime does this, it’s constant propagation in the compiler. When you have simple functions, you can prevent the compiler from constant propegating into the benchmark by timing like so

julia> @btime f1($Ref(1)[])
  29.406 ns (1 allocation: 16 bytes)
0.5

julia> @btime f2($Ref(1)[])
  37.100 ns (1 allocation: 32 bytes)
0.5 - 0.5im

goerch · January 31, 2022, 7:22pm

Hm, when checking this I noticed

f1=x->1/(x^2+1)
f2=x->1/(x^2+im)

using BenchmarkTools

println("lambda")
@btime f1(Ref(x)[]) setup=(x=1)
@btime f2(Ref(y)[]) setup=(y=2)
@btime f2(Ref(x)[]) * f2(Ref(y)[]) setup=(x=1; y=2)
@btime sum(f1(x) for x in 1:100) 
@btime sum(f2(x) for x in 1:100)
@btime sum(f1(x) * f2(x) for x in 1:100)

g1(x)=1/(x^2+1)
g2(x)=1/(x^2+im)

using BenchmarkTools

println("function")
@btime g1(Ref(x)[]) setup=(x=1)
@btime g2(Ref(y)[]) setup=(y=2)
@btime g2(Ref(x)[]) * g2(Ref(y)[]) setup=(x=1; y=2)
@btime sum(g1(x) for x in 1:100) 
@btime sum(g2(x) for x in 1:100)
@btime sum(g1(x) * g2(x) for x in 1:100)

yields

lambda
  24.598 ns (1 allocation: 16 bytes)
  40.262 ns (1 allocation: 32 bytes)
  96.832 ns (3 allocations: 96 bytes)
  4.757 μs (199 allocations: 3.11 KiB)
  6.200 μs (199 allocations: 6.22 KiB)
  10.900 μs (399 allocations: 10.91 KiB)
function
  4.300 ns (0 allocations: 0 bytes)
  19.238 ns (0 allocations: 0 bytes)
  36.052 ns (0 allocations: 0 bytes)
  96.733 ns (0 allocations: 0 bytes)
  1.910 μs (0 allocations: 0 bytes)
  2.089 μs (0 allocations: 0 bytes)

Is this expected or am I doing it wrong (again;)?

Edit: ah, I see, const-ing the lambda functions

const f1=x->1/(x^2+1)
const f2=x->1/(x^2+im)

shows the expected performance.

lzxnl · February 2, 2022, 3:21pm

Ah, I see. Thanks for letting me know why my benchmarking was not going to plan at all.

Topic		Replies	Views
How to improve performance of sum() Performance question , fast-math , tullio , loopvectorization , sum	19	4846	July 19, 2021
Why the function sum1 is faster than builtin sum Performance question , profiling	14	1039	September 21, 2023
Compare julia sum to a cpp implementation - julia is extremely slow?! Performance question	35	1795	October 7, 2019
Confused with @time result for simple addtion of arrays Performance	9	701	January 2, 2020
Why does exporting a single Float64 from a function take as long as 2,000,000 addition steps Performance	21	1165	May 9, 2019

Apparent mismatch of run times during summation

Related topics