Dynamic dispatch in a for loop

drarnau · November 17, 2020, 9:23am

Hello,

While trying to optimise my code, Juno’s profiler flagged a portion of my code as doing some dynamic dispatch. Since my original code is fairly long, here is the smallest example I could come up with to illustrate what is going on:

using Profile

function gss(f::Function, a::Float64, b::Float64, tol::Float64 = 1e-5)::Float64
    Gratio = Base.MathConstants.golden
    c = b - (b - a) / Gratio
    d = a + (b - a) / Gratio
    while abs(c - d) > tol
        if f(c) > f(d)
            b = d
        else
            a = c
        end

        c = b - (b - a) / Gratio
        d = a + (b - a) / Gratio
    end
    return (b + a) / 2.0
end

V(x::Float64, y::Float64) = x^2.0 + y^2.0

function solve_gss(a::Float64, b::Float64, V::Function)
    # Define auxiliary function
    aux_f(x::Float64) = x + V(x,a)

    # Find maximum
    aux_x = gss(aux_f, a, b)

    return aux_f(aux_x), aux_x
end

function solve_z(a::Float64, c::Float64, d::Float64, V::Function, gp::Int64 = 100)
    # Create grid for z
    grid_z = range(c, stop = d, length = gp)

    # Iterate over z
    aux_Vs = Vector{Float64}(undef, gp)
    aux_xs = Vector{Float64}(undef, gp)
    for (ind_z, z_td) in enumerate(grid_z)
        aux_Vs[ind_z], aux_xs[ind_z] = solve_gss(a, z_td, V)
    end

    # Find optimal z
    (maxV, maxLoc) = findmax(aux_Vs)

    return maxV, grid_z[maxLoc], aux_xs[maxLoc]
end

@profiler for i = 1:10000
    solve_z(0.0, 0.0, 10.0, V)
end

According to my limited understanding of Juno’s profiler, the issue is on the following line within function solve_z:

 aux_Vs[ind_z], aux_zs[ind_z] = solve_gss(a, z_td, V)

I’m confused because as far as I can see solve_gss is simply returning a Tuple{Float64,Float64} which is then stored in two arrays. I’ve tried specifying the return of solve_gss as a Tuple{Float64,Float64}:

 solve_gss(a::Float64, b::Float64, V::Function)::Tuple{Float64,Float64}

But that does not help either.

Any help is greatly appreciated.

cstjean · November 17, 2020, 11:17am

I don’t think it’ll solve your specific problems, but AFAIK it is still true that with a function argument like V::Function, the compiler will not specialize on V. You need to write the ugly V::(T where T) instead to force specialization.

DNF · November 17, 2020, 11:31am

I don’t think that’s true. You don’t have to provide concrete types in function signatures to get specialization, and in any case T where T isn’t concrete either, it’s just the same as Any, or leaving out the type entirely.

My guess is that this line inside solve_gss

aux_f(x::Float64) = x + V(x,a)

creates a new function for each iteration, even though the function is actually the same each time. Can you pull the definition of aux_f outside of solve_gss, @drarnau ?

You can also get rid of some allocations by not creating the temporary arrays aux_Vs and aux_zs, and instead keeping track of the optimum value inside the loop.

Another two very minor things is that x^2 is better than x^2.0, you should almost never use the latter, and it’s good style to use a / 2 rather than a / 2.0 (this will not really affect performance in this particular code.)

drarnau · November 17, 2020, 2:34pm

Thanks for your reply. I’ve implemented your suggestion as:

function solve_gss_ugly(a::Float64, b::Float64, V::(T where T))
    # Define auxiliary function
    aux_f(x::Float64) = x + V(x,a)

    # Find maximum
    aux_x = gss(aux_f, a, b)

    return aux_f(aux_x), aux_x
end

function solve_z_ugly(a::Float64, c::Float64, d::Float64, V::(T where T), gp::Int64 = 100)
    # Create grid for z
    grid_z = range(c, stop = d, length = gp)

    # Iterate over z
    aux_Vs = Vector{Float64}(undef, gp)
    aux_xs = Vector{Float64}(undef, gp)
    for (ind_z, z_td) in enumerate(grid_z)
        aux_Vs[ind_z], aux_xs[ind_z] = solve_gss_ugly(a, z_td, V)
    end

    # Find optimal z
    (maxV, maxLoc) = findmax(aux_Vs)

    return maxV, grid_z[maxLoc], aux_xs[maxLoc]
end

The profiler give me exactly the same as in my original implementation. Execution time and memory allocation look very similar too:

julia> @time solve_z(0.0, 0.0, 10.0, V)
  0.000057 seconds (502 allocations: 11.125 KiB)
(109.99961317879665, 10.0, 9.99998157992654)

julia> @time solve_z_ugly(0.0, 0.0, 10.0, V)
  0.000064 seconds (502 allocations: 11.125 KiB)
(109.99961317879665, 10.0, 9.99998157992654)

Any other idea?

kristoffer.carlsson · November 17, 2020, 2:55pm

If the function is called in the body, the function will specialize on the function type.

See Performance Tips · The Julia Language.

drarnau · November 17, 2020, 2:56pm

Thank you.

Unfortunately, I cannot pull out the definition of aux_f. In fact, that’s exactly in the nature of the problem I’m solving. I solve a version of the same problem N times. The difference between each problem is that I’m using a slightly different aux_f function and different bounds where to search a maximum. So, my code create many functions that are use just in one iteration. I don’t see a way around it, but maybe there is?

I tried to get rid of the temporary arrays by implementing:

function solve_z_noauxs(a::Float64, c::Float64, d::Float64, V::Function, gp::Int64 = 100)
    # Create grid for z
    grid_z = range(c, stop = d, length = gp)

    # Iterate over z
    max_V::Float64 = -Inf
    opt_x::Float64 = NaN
    opt_z::Float64 = NaN
    for (ind_z, z_td) in enumerate(grid_z)
        temp_V, temp_x = solve_gss(a, z_td, V)

        if temp_V > max_V
            opt_x = temp_x
            opt_z = grid_z[ind_z]
            max_V = temp_V
        end
    end

    return max_V, opt_z, opt_x
end

I get the same profile and slightly more time/allocated memory:

julia> @time solve_z(0.0, 0.0, 10.0, V)
  0.000057 seconds (502 allocations: 11.125 KiB)
(109.99961317879665, 10.0, 9.99998157992654)

julia> @time solve_z_noauxs(0.0, 0.0, 10.0, V)
  0.000067 seconds (600 allocations: 10.938 KiB)
(109.99961317879665, 10.0, 9.99998157992654)

Thank you for the suggestion on using x^2 and so on. My logic was that if I’m computing the power of a Float it’s better to use a Float. Same for dividing.

drarnau · November 17, 2020, 2:59pm

Thanks. Just to make sure I understand. Are you confirming that in my initial code the function will specialise?

rdeits · November 17, 2020, 3:02pm

This isn’t completely true, and :: Function is an important exception to the rule. See Performance Tips · The Julia Language

Benny · November 17, 2020, 3:27pm

I’m probably missing something but I don’t understand why you can’t pull this function out:

aux_f(x::Float64) = x + V(x,a)

to take the a value as an argument instead:

aux_f(x::Float64, a) = x + V(x,a)

so instead of remaking aux_f for each a you just feed the a value into the same aux_f. Youwould also have to edit the calls in gss from f(x) to f(x, a), but you wouldn’t even have to edit the header of gss because in solve_gss you pass the same a into the gss call.

DNF · November 17, 2020, 3:29pm

I should have said: in your example code you can pull the definition of aux_f out from solve_gss to solve_z, but I wasn’t certain if you could do that in general.

So when you say you cannot do it, do you mean that it’s impossible in your full code, unlike in this example?

drarnau · November 17, 2020, 4:03pm

Exactly. I know that in this example I can. In my full code, it’s something that I need to do.

drarnau · November 17, 2020, 4:04pm

Thanks for chipping in. This code is just a toy example of a much bigger code. I’m aware that here I could pull out the function. In my full code I can’t.

Benny · November 17, 2020, 4:15pm

Oh I see. It’d be cool if you could make a new example to show why you can’t, but that’s not necessarily related to your question.
I think you could still try lifting the function out in your original example just to see if it changes anything for the profiler and performance, just to check if the function-redefining is the source of what you’re seeing. If not, then we know for sure to look for the problem elsewhere.

drarnau · November 17, 2020, 4:40pm

Good point. When I change V and solve_gss as you suggest:

V(x::Float64) = x^2

function solve_gss_Vout(a::Float64, b::Float64, V::Function)
    # Find maximum
    aux_x = gss(V, a, b)

    return V(aux_x), aux_x
end

function solve_z_Vout(a::Float64, c::Float64, d::Float64, V::Function, gp::Int64 = 100)
    # Create grid for z
    grid_z = range(c, stop = d, length = gp)

    # Iterate over z
    aux_Vs = Vector{Float64}(undef, gp)
    aux_xs = Vector{Float64}(undef, gp)
    for (ind_z, z_td) in enumerate(grid_z)
        aux_Vs[ind_z], aux_xs[ind_z] = solve_gss_Vout(a, z_td, V)
    end

    # Find optimal z
    (maxV, maxLoc) = findmax(aux_Vs)

    return maxV, grid_z[maxLoc], aux_xs[maxLoc]
end

The profiler looks exactly the same as in my original example. Time and memory allocation seem very similar:

julia> @time solve_z(0.0, 0.0, 10.0, V)
  0.000054 seconds (502 allocations: 11.125 KiB)
(109.99961317879665, 10.0, 9.99998157992654)

julia> @time solve_z_Vout(0.0, 0.0, 10.0, V)
  0.000051 seconds (502 allocations: 11.125 KiB)
(99.99963159887011, 10.0, 9.99998157992654)

This suggests that is not about the creation of a new function in each call.

DNF · November 17, 2020, 5:17pm

I made a version of this that pulled out the function definition (and removed the temporary arrays) which produced zero allocations. Unfortunately, I’m not at my computer now.

The speedup was not dramatic, though, maybe 30%

drarnau · November 17, 2020, 5:26pm

Great! I look forward to seeing your code. Thanks.

DNF · November 17, 2020, 7:01pm

function gss(f::Function, a, b, tol = 1e-5)
    ϕ = MathConstants.golden
    c = b - (b - a) / ϕ
    d = a + (b - a) / ϕ
    while abs(c - d) > tol
        if f(c) > f(d)
            b = d
        else
            a = c
        end
        c = b - (b - a) / ϕ
        d = a + (b - a) / ϕ
    end
    return (b + a) / 2
end

V(x, y) = x^2 + y^2

function solve_gss(a, b, g::Function)
    x = gss(g, a, b)
    return g(x), x
end

function solve_z(a, c, d, f::Function, gp::Int = 100)
    local z, zs
    grid_z = range(c, d; length = gp)
    Vs = -Inf
    for z_ in grid_z
        Vs_, zs_ = solve_gss(a, z_, x->x+f(x, a))
        if Vs_ > Vs
            Vs, z, zs = Vs_, z_, zs_
        end
    end
    return Vs, z, zs
end

julia> @btime solve_z_old(0.0, 0.0, 10.0, V)
  42.033 μs (503 allocations: 11.16 KiB)
(109.99961317879665, 10.0, 9.99998157992654)

julia> @btime solve_z(0.0, 0.0, 10.0, V)
  31.975 μs (0 allocations: 0 bytes)
(109.99961317879665, 10.0, 9.99998157992654)

drarnau · November 18, 2020, 6:03pm

Thanks. I’ve played around with your code. The reduction in allocation is basically due to the fact that in this example you can define the function that is passed to solve_gss as x->x+f(x, a) which in my full code I can’t do.

Topic		Replies	Views
Unsolvable dynamic dispatch in a `for` loop Performance question	3	513	November 23, 2020
What is the best way to re-use a temporary vector Performance performance , memory-allocation	16	338	October 22, 2024
Performance issue due to function as an argument General Usage question , performance	16	844	September 22, 2023
Save dispatched function for later call General Usage	46	1660	September 1, 2021
Avoiding Dynamic dispatch General Usage	11	4329	October 13, 2017

Dynamic dispatch in a for loop

Related topics