The huge difference of two functions for the same goal inside and outside of function

swish47 · October 27, 2022, 6:46pm

There are two functions for computing the same array. However, there is a weird thing on time elapse.
Inside function:

using FLoops
ot=collect(range(start=-3,length=3600*4,stop=3));
st=rand(1000000);
sgf::Vector{ComplexF64}=zeros(ComplexF64, length(ot));

  function work1(sgf::AbstractVector, st::AbstractVector, ot::AbstractVector)
    @floop @simd for nn in eachindex(ot)
    @inbounds @fastmath sgf[nn] = ComplexF64(sum(inv.(ComplexF64(ot[nn]+0.001*im).-st)));
    end
  end
  @time work1(sgf, st, ot);
  64.268151 seconds (2.32 M allocations: 214.695 GiB, 13.04% gc time, 0.80% compilation time)
  
  function work2(sgf::AbstractVector, st::AbstractVector, ot::AbstractVector)
    @floop @simd for nn in eachindex(ot)
      @inbounds @fastmath sgf[nn] = sum(inv(ot[nn] + 0.001*im - s) for s in st)
    end
  end
  @time work2(sgf, st, ot);
  19.978370 seconds (774.47 k allocations: 41.673 MiB, 0.97% compilation time)

Outside function:

@time @floop @simd for nn in 1:length(ot)
    @inbounds @fastmath sgf[nn]::ComplexF64=ComplexF64(sum(inv.(ComplexF64(ot[nn]+0.001*im).-st)));
end
64.260086 seconds (828.07 k allocations: 214.616 GiB, 12.96% gc time, 0.23% compilation time)

@time @floop @simd for nn in eachindex(ot)
      @inbounds @fastmath sgf[nn] = sum(inv(ot[nn] + 0.001*im - s) for s in st)
    end
2062.653257 seconds (100.29 G allocations: 2.298 TiB, 14.82% gc time, 0.02% compilation time)

What cause the same commands having huge different time cost in or out of function? Only because of global variables? What is the mechanism.

lmiq · October 27, 2022, 6:58pm

The specialization of the operations to the type of variable. Here is one take on that, written for myself when I was starting with Julia: Type instability · JuliaNotes.jl. You will find a lot of information about specialization and type stability in Julia, that is a fundamental characteristic of how Julia works and what allows it to be fast.

swish47 · October 27, 2022, 7:06pm

Thanks for your reply. But in my codes, I already declare all variables with certain type. And you can see, I solve this by vectorized and loop 2 methods. Vectorized method shows the same performance no matter in or out of function. How to explain this? Only for loop method, there is 100x difference.

lmiq · October 27, 2022, 7:11pm

That because the vectorization is introducing a function barrier, which restricts the type instability to the “surface” of the code, not to the loop. When you use a broadcasted operation like:

x .= 1.0

is like you where doing

function f(x)
    for val in x
        x += one(eltype(x))
    end
    return x
end
f(x)

Thus the function is implicit there. This is sort of the same mechanism that makes vectorized codes in other languages be fast while loops are slow (they call specialized versions of the operations), except that in Julia you can write the functions that do those operations in Julia itself, just guaranteeing that the code can be compiled for the types of variables involved, which essentially depends on the code being within a type-stable block (a function, for instance).

lmiq · October 27, 2022, 7:40pm

Note that you did not declare the types of ot and st. If you do that, with for example st::Vector{Float64} = rand(...), the difference in performance drastically reduced. Still it is good practice to just put everything inside functions.

swish47 · October 27, 2022, 8:22pm

Thanks for your help. You are right, I forgot to declare those two. After I do this, loop method become as fast as in function. And vectorized version doesn’t change.

Topic		Replies	Views
Why are two functions behave differently? Performance question	2	230	February 1, 2023
Puzzled by the difference in performance when summing an array in different ways Performance array , memory-allocation	4	869	July 19, 2019
Comparing performance of 2 simple averaging functions - why is one faster? Performance	5	482	August 31, 2020
How to speed up this function? Performance	2	456	August 8, 2021
Mysterious runtime difference Performance	3	194	February 26, 2023

The huge difference of two functions for the same goal inside and outside of function

Related topics