# Does the cost of function call matter in Julia?

I’ve written a long function, and according to https://docs.julialang.org/en/latest/manual/performance-tips/, I rewrote it with some functions to avoid the global variables. The performance is nicer now. My questions are, does the cost of function call matter in Julia? Is writing so many function necessary? how can I optimize my code (I found the line @. prop_e = cis( ele * (xplusy * dt/2.0)) and the “get momentum” part might be the bottleneck and the @threading strategy seems unsuitable because the gc time is very large)?

The code is

``````function evolution(u::Array{Complex128})
t = Array((1:nt) * dt)
ele = map(GetEleField, t)
save("elefield.jld", "t", t, "ele", ele)

x, p = xpgrid()
########### common part
p_fft! = plan_fft!( similar(u), flags=FFTW.MEASURE )
prop_x, prop_p = operation_real_evolution(x, p)
prop_e = similar( prop_x )

r_a = 750 #ngrid * dx /2.0 - 80.0
δ = 15.0
r_sp = 120.0
δ_sp = 10.0

absorb = Array(Float64, ngrid, ngrid)
splitter = Array(Float64, ngrid, ngrid)
xplusy = Array(Float64, ngrid, ngrid)
for j in 1:ngrid
for i in 1:ngrid
xplusy[i, j] = x[i] + x[j]
absorb[i, j] = (1.0 - out(x[i], r_a, δ ))* (1.0 - out(x[j], r_a, δ))
splitter[i, j] = out(x[i], r_sp, δ_sp) * out(x[j], r_sp, δ_sp)
end
end

##### splitting
pp = similar(p)
A = -cumsum(ele) * dt
A² = A.*A
uo = zeros(Complex128, ngrid, ngrid)
up = zeros(Complex128, ngrid, ngrid)
#####
for i in 1:nt
step_evolution_real!(u, p_fft!, prop_x, prop_p, ele[i], xplusy, absorb, prop_e)

########## get momentum
if i % 3 == 0
@. uo = (u * splitter) * cis(A[i] * xplusy)

p_fft! * uo
pp .= (p.^2/2.0) * (nt-i) .+ p*sum(@view A[i:nt]) + sum(@view A²[i:nt])/2.0
for j2 in 1:ngrid
for j1 in 1:ngrid
uo[j1, j2] = uo[j1, j2] * cis( -dt * (pp[j1] + pp[j2]) )
end
end
up .+= uo
end

########### end get momentum

if (mod(i, 400) == 0)
println(100*i/real(nt),"%,      time = ", now())
end
end

nothing
end

out(x, r_a, δ) = 1.0 ./ (1.0 + exp( -(abs(x) - r_a) ./δ ))

function step_evolution_real!(u::Array{Complex128, 2},
p_fft!::Base.DFT.FFTW.cFFTWPlan{Complex128, -1, true, 2},
prop_x::Array{Complex128, 2}, prop_p::Array{Complex128, 2},
ele::Float64, xplusy::Array{Float64, 2},
absorb::Array{Float64, 2}, prop_e::Array{Complex128, 2})
@. prop_e = cis( ele * (xplusy * dt/2.0))
u .*= prop_x .* prop_e

p_fft! * u
u .*= prop_p

p_fft! \ u
@. u *= prop_x * prop_e

u .*= absorb
nothing
end

function operation_real_evolution(x::Array{Float64, 1}, p::Array{Float64, 1})
prop_x = Array{Complex128}(ngrid , ngrid)
prop_p = Array{Complex128}(ngrid , ngrid)

for j in 1:ngrid
for i in 1:ngrid
prop_x[i, j] = cis( -(dt / 2) * Potential(x[i], x[j]) )
prop_p[i, j] = cis( -(dt / 2) * (p[i]^2 + p[j]^2) )
end
end
prop_x, prop_p
end

``````

Small functions are generally in-lined, so there is no call-cost for them. If you are not satisfied with the inline-heuristics then decorate your function with `@inline` or `@noinline`.

Without having read your code-example, this suggests that you can improve your algorithm with per-allocating work arrays and then using them. Thus no new data gets allocated during a loop and GC time should be small.

At least from your code sample it seems that `ngrid` and `nt` are still global variables. This could explain the large gc time.

1 Like

As far as i know, global variables won’t cause performance problems if they are not Vectors or Matrix. Will Numbers cause preallocation problem?

Any non-`const` global variable might cause inference issues which propagate through your code, so yes, you should definitely take care when using them.

I think you misunderstand about globals, please read the relevant part the performance tips (better, read the whole thing).

In many compiled but dynamically types languages, making large, “monolithic” functions is advantageous since it allows the compiler to figure out type information. With Julia, the situation is the opposite: the language is designed to work with type information as efficiently as statically typed languages, you just have to make this possible. One technique for that is actually factoring out behavior into smaller functions, called function barriers.

In my code, I have factored the behaviors into small functions. So the bottleneck is not about function calls.

In another module, I already declared them as

``````const ngrid=2^13
const nt = 10000
``````

and used this module when calling the function evolution()
and add “const” before r_a, δ, r_sp, δ_sp doesn’t make any difference.

Actually, the code I show here is fast enough, which is about 1.7x slower than the Fortran version with “-O3 -mkl -parallel”. However, I just can’t be easily satisfied…

In that case, perhaps try profiling. When you get within 2x Fortran or C, it becomes harder to talk about code optimization in abstract terms, the solution may be specific to your code.

1 Like