The huge difference of two functions for the same goal inside and outside of function

There are two functions for computing the same array. However, there is a weird thing on time elapse.
Inside function:

using FLoops
ot=collect(range(start=-3,length=3600*4,stop=3));
st=rand(1000000);
sgf::Vector{ComplexF64}=zeros(ComplexF64, length(ot));

  function work1(sgf::AbstractVector, st::AbstractVector, ot::AbstractVector)
    @floop @simd for nn in eachindex(ot)
    @inbounds @fastmath sgf[nn] = ComplexF64(sum(inv.(ComplexF64(ot[nn]+0.001*im).-st)));
    end
  end
  @time work1(sgf, st, ot);
  64.268151 seconds (2.32 M allocations: 214.695 GiB, 13.04% gc time, 0.80% compilation time)
  
  function work2(sgf::AbstractVector, st::AbstractVector, ot::AbstractVector)
    @floop @simd for nn in eachindex(ot)
      @inbounds @fastmath sgf[nn] = sum(inv(ot[nn] + 0.001*im - s) for s in st)
    end
  end
  @time work2(sgf, st, ot);
  19.978370 seconds (774.47 k allocations: 41.673 MiB, 0.97% compilation time)

Outside function:

@time @floop @simd for nn in 1:length(ot)
    @inbounds @fastmath sgf[nn]::ComplexF64=ComplexF64(sum(inv.(ComplexF64(ot[nn]+0.001*im).-st)));
end
64.260086 seconds (828.07 k allocations: 214.616 GiB, 12.96% gc time, 0.23% compilation time)

@time @floop @simd for nn in eachindex(ot)
      @inbounds @fastmath sgf[nn] = sum(inv(ot[nn] + 0.001*im - s) for s in st)
    end
2062.653257 seconds (100.29 G allocations: 2.298 TiB, 14.82% gc time, 0.02% compilation time)

What cause the same commands having huge different time cost in or out of function? Only because of global variables? What is the mechanism.

The specialization of the operations to the type of variable. Here is one take on that, written for myself when I was starting with Julia: Type instability · JuliaNotes.jl. You will find a lot of information about specialization and type stability in Julia, that is a fundamental characteristic of how Julia works and what allows it to be fast.

2 Likes

Thanks for your reply. But in my codes, I already declare all variables with certain type. And you can see, I solve this by vectorized and loop 2 methods. Vectorized method shows the same performance no matter in or out of function. How to explain this? Only for loop method, there is 100x difference.

That because the vectorization is introducing a function barrier, which restricts the type instability to the “surface” of the code, not to the loop. When you use a broadcasted operation like:

x .= 1.0

is like you where doing

function f(x)
    for val in x
        x += one(eltype(x))
    end
    return x
end
f(x)

Thus the function is implicit there. This is sort of the same mechanism that makes vectorized codes in other languages be fast while loops are slow (they call specialized versions of the operations), except that in Julia you can write the functions that do those operations in Julia itself, just guaranteeing that the code can be compiled for the types of variables involved, which essentially depends on the code being within a type-stable block (a function, for instance).

Note that you did not declare the types of ot and st. If you do that, with for example st::Vector{Float64} = rand(...), the difference in performance drastically reduced. Still it is good practice to just put everything inside functions.

Thanks for your help. You are right, I forgot to declare those two. After I do this, loop method become as fast as in function. And vectorized version doesn’t change.