Does the cost of function call matter in Julia?

111 · November 10, 2017, 9:34am

I’ve written a long function, and according to https://docs.julialang.org/en/latest/manual/performance-tips/, I rewrote it with some functions to avoid the global variables. The performance is nicer now. My questions are, does the cost of function call matter in Julia? Is writing so many function necessary? how can I optimize my code (I found the line @. prop_e = cis( ele * (xplusy * dt/2.0)) and the “get momentum” part might be the bottleneck and the @threading strategy seems unsuitable because the gc time is very large)?

The code is

function evolution(u::Array{Complex128})
    t = Array((1:nt) * dt)
    ele = map(GetEleField, t)
    save("elefield.jld", "t", t, "ele", ele)

    FFTW.set_num_threads(Sys.CPU_CORES)

    x, p = xpgrid()
    ########### common part
    p_fft! = plan_fft!( similar(u), flags=FFTW.MEASURE )
    prop_x, prop_p = operation_real_evolution(x, p)
    prop_e = similar( prop_x )

    r_a = 750 #ngrid * dx /2.0 - 80.0
    δ = 15.0
    r_sp = 120.0
    δ_sp = 10.0

    absorb = Array(Float64, ngrid, ngrid)
    splitter = Array(Float64, ngrid, ngrid)
    xplusy = Array(Float64, ngrid, ngrid)
    for j in 1:ngrid
        for i in 1:ngrid
            xplusy[i, j] = x[i] + x[j]
            absorb[i, j] = (1.0 - out(x[i], r_a, δ ))* (1.0 - out(x[j], r_a, δ))
            splitter[i, j] = out(x[i], r_sp, δ_sp) * out(x[j], r_sp, δ_sp)
        end
    end

    ##### splitting
    pp = similar(p)
    A = -cumsum(ele) * dt
    A² = A.*A
    uo = zeros(Complex128, ngrid, ngrid)
    up = zeros(Complex128, ngrid, ngrid)
    #####
    for i in 1:nt
        step_evolution_real!(u, p_fft!, prop_x, prop_p, ele[i], xplusy, absorb, prop_e)

        ########## get momentum
        if i % 3 == 0
            @. uo = (u * splitter) * cis(A[i] * xplusy)

            p_fft! * uo
            pp .= (p.^2/2.0) * (nt-i) .+ p*sum(@view A[i:nt]) + sum(@view A²[i:nt])/2.0
            for j2 in 1:ngrid
                for j1 in 1:ngrid
                    uo[j1, j2] = uo[j1, j2] * cis( -dt * (pp[j1] + pp[j2]) )
                end
            end
            up .+= uo
        end

        ########### end get momentum

        if (mod(i, 400) == 0)
            println(100*i/real(nt),"%,      time = ", now())
        end
    end

    nothing
end

out(x, r_a, δ) = 1.0 ./ (1.0 + exp( -(abs(x) - r_a) ./δ ))

function step_evolution_real!(u::Array{Complex128, 2},
                              p_fft!::Base.DFT.FFTW.cFFTWPlan{Complex128, -1, true, 2},
                              prop_x::Array{Complex128, 2}, prop_p::Array{Complex128, 2},
                              ele::Float64, xplusy::Array{Float64, 2},
                              absorb::Array{Float64, 2}, prop_e::Array{Complex128, 2})
    @. prop_e = cis( ele * (xplusy * dt/2.0))
    u .*= prop_x .* prop_e

    p_fft! * u
    u .*= prop_p

    p_fft! \ u
    @. u *= prop_x * prop_e

    u .*= absorb
    nothing
end

function operation_real_evolution(x::Array{Float64, 1}, p::Array{Float64, 1})
    prop_x = Array{Complex128}(ngrid , ngrid)
    prop_p = Array{Complex128}(ngrid , ngrid)

    for j in 1:ngrid
        for i in 1:ngrid
            prop_x[i, j] = cis( -(dt / 2) * Potential(x[i], x[j]) )
            prop_p[i, j] = cis( -(dt / 2) * (p[i]^2 + p[j]^2) )
        end
    end
    prop_x, prop_p
end

mauro3 · November 10, 2017, 10:25am

Small functions are generally in-lined, so there is no call-cost for them. If you are not satisfied with the inline-heuristics then decorate your function with @inline or @noinline.

Without having read your code-example, this suggests that you can improve your algorithm with per-allocating work arrays and then using them. Thus no new data gets allocated during a loop and GC time should be small.

saschatimme · November 10, 2017, 10:36am

At least from your code sample it seems that ngrid and nt are still global variables. This could explain the large gc time.

111 · November 10, 2017, 10:39am

As far as i know, global variables won’t cause performance problems if they are not Vectors or Matrix. Will Numbers cause preallocation problem?

pfitzseb · November 10, 2017, 10:55am

Any non-const global variable might cause inference issues which propagate through your code, so yes, you should definitely take care when using them.

Tamas_Papp · November 10, 2017, 10:55am

I think you misunderstand about globals, please read the relevant part the performance tips (better, read the whole thing).

In many compiled but dynamically types languages, making large, “monolithic” functions is advantageous since it allows the compiler to figure out type information. With Julia, the situation is the opposite: the language is designed to work with type information as efficiently as statically typed languages, you just have to make this possible. One technique for that is actually factoring out behavior into smaller functions, called function barriers.

111 · November 10, 2017, 11:04am

In my code, I have factored the behaviors into small functions. So the bottleneck is not about function calls.

111 · November 10, 2017, 11:17am

In another module, I already declared them as

const ngrid=2^13
const nt = 10000

and used this module when calling the function evolution()
and add “const” before r_a, δ, r_sp, δ_sp doesn’t make any difference.

111 · November 10, 2017, 11:29am

Actually, the code I show here is fast enough, which is about 1.7x slower than the Fortran version with “-O3 -mkl -parallel”. However, I just can’t be easily satisfied…

Tamas_Papp · November 10, 2017, 12:05pm

In that case, perhaps try profiling. When you get within 2x Fortran or C, it becomes harder to talk about code optimization in abstract terms, the solution may be specific to your code.

Topic		Replies	Views
Memory Pre-allocation in the global scope New to Julia question	20	1201	August 18, 2022
Contents of the function run well when timed alone, but not when they are in a loop inside the function Performance question , performance , arrays , function , sparse	8	903	May 31, 2022
The huge difference of two functions for the same goal inside and outside of function Performance	5	271	October 27, 2022
Execution speed - can someone explain the difference? New to Julia	9	486	April 8, 2024
Achieving C++ Speeds for a CFD Algorithm: Julia spends too much time on pre-allocation? Performance	22	3633	November 28, 2018

Does the cost of function call matter in Julia?

Related topics