# Dynamic dispatch in a for loop

Hello,

While trying to optimise my code, Juno’s profiler flagged a portion of my code as doing some dynamic dispatch. Since my original code is fairly long, here is the smallest example I could come up with to illustrate what is going on:

``````using Profile

function gss(f::Function, a::Float64, b::Float64, tol::Float64 = 1e-5)::Float64
Gratio = Base.MathConstants.golden
c = b - (b - a) / Gratio
d = a + (b - a) / Gratio
while abs(c - d) > tol
if f(c) > f(d)
b = d
else
a = c
end

c = b - (b - a) / Gratio
d = a + (b - a) / Gratio
end
return (b + a) / 2.0
end

V(x::Float64, y::Float64) = x^2.0 + y^2.0

function solve_gss(a::Float64, b::Float64, V::Function)
# Define auxiliary function
aux_f(x::Float64) = x + V(x,a)

# Find maximum
aux_x = gss(aux_f, a, b)

return aux_f(aux_x), aux_x
end

function solve_z(a::Float64, c::Float64, d::Float64, V::Function, gp::Int64 = 100)
# Create grid for z
grid_z = range(c, stop = d, length = gp)

# Iterate over z
aux_Vs = Vector{Float64}(undef, gp)
aux_xs = Vector{Float64}(undef, gp)
for (ind_z, z_td) in enumerate(grid_z)
aux_Vs[ind_z], aux_xs[ind_z] = solve_gss(a, z_td, V)
end

# Find optimal z
(maxV, maxLoc) = findmax(aux_Vs)

return maxV, grid_z[maxLoc], aux_xs[maxLoc]
end

@profiler for i = 1:10000
solve_z(0.0, 0.0, 10.0, V)
end
``````

According to my limited understanding of Juno’s profiler, the issue is on the following line within function `solve_z`:

`````` aux_Vs[ind_z], aux_zs[ind_z] = solve_gss(a, z_td, V)
``````

I’m confused because as far as I can see `solve_gss` is simply returning a `Tuple{Float64,Float64}` which is then stored in two arrays. I’ve tried specifying the return of `solve_gss` as a `Tuple{Float64,Float64}`:

`````` solve_gss(a::Float64, b::Float64, V::Function)::Tuple{Float64,Float64}
``````

But that does not help either.

Any help is greatly appreciated.

I don’t think it’ll solve your specific problems, but AFAIK it is still true that with a function argument like `V::Function`, the compiler will not specialize on `V`. You need to write the ugly `V::(T where T)` instead to force specialization.

1 Like

I don’t think that’s true. You don’t have to provide concrete types in function signatures to get specialization, and in any case `T where T` isn’t concrete either, it’s just the same as `Any`, or leaving out the type entirely.

My guess is that this line inside `solve_gss`

``````aux_f(x::Float64) = x + V(x,a)
``````

creates a new function for each iteration, even though the function is actually the same each time. Can you pull the definition of `aux_f` outside of `solve_gss`, @drarnau ?

You can also get rid of some allocations by not creating the temporary arrays `aux_Vs` and `aux_zs`, and instead keeping track of the optimum value inside the loop.

Another two very minor things is that `x^2` is better than `x^2.0`, you should almost never use the latter, and it’s good style to use `a / 2` rather than `a / 2.0` (this will not really affect performance in this particular code.)

3 Likes

``````function solve_gss_ugly(a::Float64, b::Float64, V::(T where T))
# Define auxiliary function
aux_f(x::Float64) = x + V(x,a)

# Find maximum
aux_x = gss(aux_f, a, b)

return aux_f(aux_x), aux_x
end

function solve_z_ugly(a::Float64, c::Float64, d::Float64, V::(T where T), gp::Int64 = 100)
# Create grid for z
grid_z = range(c, stop = d, length = gp)

# Iterate over z
aux_Vs = Vector{Float64}(undef, gp)
aux_xs = Vector{Float64}(undef, gp)
for (ind_z, z_td) in enumerate(grid_z)
aux_Vs[ind_z], aux_xs[ind_z] = solve_gss_ugly(a, z_td, V)
end

# Find optimal z
(maxV, maxLoc) = findmax(aux_Vs)

return maxV, grid_z[maxLoc], aux_xs[maxLoc]
end
``````

The profiler give me exactly the same as in my original implementation. Execution time and memory allocation look very similar too:

``````julia> @time solve_z(0.0, 0.0, 10.0, V)
0.000057 seconds (502 allocations: 11.125 KiB)
(109.99961317879665, 10.0, 9.99998157992654)

julia> @time solve_z_ugly(0.0, 0.0, 10.0, V)
0.000064 seconds (502 allocations: 11.125 KiB)
(109.99961317879665, 10.0, 9.99998157992654)
``````

Any other idea?

If the function is called in the body, the function will specialize on the function type.

4 Likes

Thank you.

Unfortunately, I cannot pull out the definition of `aux_f`. In fact, that’s exactly in the nature of the problem I’m solving. I solve a version of the same problem `N` times. The difference between each problem is that I’m using a slightly different `aux_f` function and different bounds where to search a maximum. So, my code create many functions that are use just in one iteration. I don’t see a way around it, but maybe there is?

I tried to get rid of the temporary arrays by implementing:

``````function solve_z_noauxs(a::Float64, c::Float64, d::Float64, V::Function, gp::Int64 = 100)
# Create grid for z
grid_z = range(c, stop = d, length = gp)

# Iterate over z
max_V::Float64 = -Inf
opt_x::Float64 = NaN
opt_z::Float64 = NaN
for (ind_z, z_td) in enumerate(grid_z)
temp_V, temp_x = solve_gss(a, z_td, V)

if temp_V > max_V
opt_x = temp_x
opt_z = grid_z[ind_z]
max_V = temp_V
end
end

return max_V, opt_z, opt_x
end
``````

I get the same profile and slightly more time/allocated memory:

``````julia> @time solve_z(0.0, 0.0, 10.0, V)
0.000057 seconds (502 allocations: 11.125 KiB)
(109.99961317879665, 10.0, 9.99998157992654)

julia> @time solve_z_noauxs(0.0, 0.0, 10.0, V)
0.000067 seconds (600 allocations: 10.938 KiB)
(109.99961317879665, 10.0, 9.99998157992654)
``````

Thank you for the suggestion on using `x^2` and so on. My logic was that if I’m computing the power of a Float it’s better to use a Float. Same for dividing.

Thanks. Just to make sure I understand. Are you confirming that in my initial code the function will specialise?

This isn’t completely true, and `:: Function` is an important exception to the rule. See Performance Tips · The Julia Language

3 Likes

I’m probably missing something but I don’t understand why you can’t pull this function out:

``````aux_f(x::Float64) = x + V(x,a)
``````

to take the `a` value as an argument instead:

``````aux_f(x::Float64, a) = x + V(x,a)
``````

so instead of remaking `aux_f` for each `a` you just feed the `a` value into the same `aux_f`. Youwould also have to edit the calls in `gss` from `f(x)` to `f(x, a)`, but you wouldn’t even have to edit the header of `gss` because in `solve_gss` you pass the same `a` into the `gss` call.

1 Like

I should have said: in your example code you can pull the definition of `aux_f` out from `solve_gss` to `solve_z`, but I wasn’t certain if you could do that in general.

So when you say you cannot do it, do you mean that it’s impossible in your full code, unlike in this example?

Exactly. I know that in this example I can. In my full code, it’s something that I need to do.

Thanks for chipping in. This code is just a toy example of a much bigger code. I’m aware that here I could pull out the function. In my full code I can’t.

Oh I see. It’d be cool if you could make a new example to show why you can’t, but that’s not necessarily related to your question.
I think you could still try lifting the function out in your original example just to see if it changes anything for the profiler and performance, just to check if the function-redefining is the source of what you’re seeing. If not, then we know for sure to look for the problem elsewhere.

1 Like

Good point. When I change `V` and `solve_gss` as you suggest:

``````V(x::Float64) = x^2

function solve_gss_Vout(a::Float64, b::Float64, V::Function)
# Find maximum
aux_x = gss(V, a, b)

return V(aux_x), aux_x
end

function solve_z_Vout(a::Float64, c::Float64, d::Float64, V::Function, gp::Int64 = 100)
# Create grid for z
grid_z = range(c, stop = d, length = gp)

# Iterate over z
aux_Vs = Vector{Float64}(undef, gp)
aux_xs = Vector{Float64}(undef, gp)
for (ind_z, z_td) in enumerate(grid_z)
aux_Vs[ind_z], aux_xs[ind_z] = solve_gss_Vout(a, z_td, V)
end

# Find optimal z
(maxV, maxLoc) = findmax(aux_Vs)

return maxV, grid_z[maxLoc], aux_xs[maxLoc]
end
``````

The profiler looks exactly the same as in my original example. Time and memory allocation seem very similar:

``````julia> @time solve_z(0.0, 0.0, 10.0, V)
0.000054 seconds (502 allocations: 11.125 KiB)
(109.99961317879665, 10.0, 9.99998157992654)

julia> @time solve_z_Vout(0.0, 0.0, 10.0, V)
0.000051 seconds (502 allocations: 11.125 KiB)
(99.99963159887011, 10.0, 9.99998157992654)
``````

This suggests that is not about the creation of a new function in each call.

1 Like

I made a version of this that pulled out the function definition (and removed the temporary arrays) which produced zero allocations. Unfortunately, I’m not at my computer now.

The speedup was not dramatic, though, maybe 30%

2 Likes

Great! I look forward to seeing your code. Thanks.

``````function gss(f::Function, a, b, tol = 1e-5)
ϕ = MathConstants.golden
c = b - (b - a) / ϕ
d = a + (b - a) / ϕ
while abs(c - d) > tol
if f(c) > f(d)
b = d
else
a = c
end
c = b - (b - a) / ϕ
d = a + (b - a) / ϕ
end
return (b + a) / 2
end

V(x, y) = x^2 + y^2

function solve_gss(a, b, g::Function)
x = gss(g, a, b)
return g(x), x
end

function solve_z(a, c, d, f::Function, gp::Int = 100)
local z, zs
grid_z = range(c, d; length = gp)
Vs = -Inf
for z_ in grid_z
Vs_, zs_ = solve_gss(a, z_, x->x+f(x, a))
if Vs_ > Vs
Vs, z, zs = Vs_, z_, zs_
end
end
return Vs, z, zs
end
``````
``````julia> @btime solve_z_old(0.0, 0.0, 10.0, V)
42.033 μs (503 allocations: 11.16 KiB)
(109.99961317879665, 10.0, 9.99998157992654)

julia> @btime solve_z(0.0, 0.0, 10.0, V)
31.975 μs (0 allocations: 0 bytes)
(109.99961317879665, 10.0, 9.99998157992654)
``````
2 Likes

Thanks. I’ve played around with your code. The reduction in allocation is basically due to the fact that in this example you can define the function that is passed to `solve_gss` as `x->x+f(x, a)` which in my full code I can’t do.