I have found that using a small nested function to handle repeated code has resulted in unwanted allocations. I tried to simplify the below function:

``````function compute_distance(grid::Matrix{T}, I::CartesianIndex{2}, Ifirst::CartesianIndex{2}, Ilast::CartesianIndex{2}, h::Real, maxdist::Real) where {T <: AbstractFloat}
Iv = CartesianIndex(1, 0)
Ih = CartesianIndex(0, 1)

Jfirst = max(I-Iv, Ifirst)
Ufirst = Jfirst == I ? maxdist : grid[Jfirst]
Jlast = min(I+Iv, Ilast)
Ulast = Jlast == I ? maxdist : grid[Jlast]
Uv = min(Ufirst, Ulast)

Jfirst = max(I-Ih, Ifirst)
Ufirst = Jfirst == I ? maxdist : grid[Jfirst]
Jlast = min(I+Ih, Ilast)
Ulast = Jlast == I ? maxdist : grid[Jlast]
Uh = min(Ufirst, Ulast)

if Uh < maxdist && Uv < maxdist
disc = 2*h^2-(Uh-Uv)^2
if disc >= 0
return min(0.5*(Uh+Uv)+0.5*sqrt(disc), maxdist)
end
end

Umin = min(Uv, Uh)
return min(h+Umin, maxdist)
end
``````

using the function m():

``````function compute_distance(grid::Matrix{T}, I::CartesianIndex{2}, Ifirst::CartesianIndex{2}, Ilast::CartesianIndex{2}, h::Real, maxdist::Real) where {T <: AbstractFloat}
Iv = CartesianIndex(1, 0)
Ih = CartesianIndex(0, 1)

function m(Ii)
Jfirst = max(I-Ii, Ifirst)
Ufirst = Jfirst == I ? maxdist : grid[Jfirst]
Jlast = min(I+Ii, Ilast)
Ulast = Jlast == I ? maxdist : grid[Jlast]
Umin = min(Ufirst, Ulast)
end
Uh, Uv = m(Ih), m(Iv)

if Uh < maxdist && Uv < maxdist
disc = 2*h^2-(Uh-Uv)^2
if disc >= 0
return min(0.5*(Uh+Uv)+0.5*sqrt(disc), maxdist)
end
end

Umin = min(Uv, Uh)
return min(h+Umin, maxdist)
end
``````

When I benchmark these two versions with

``````grid = zeros(60,60);
R = CartesianIndices(grid)
Ifirst, Ilast = first(R), last(R)
@btime compute_distance(\$(grid), CartesianIndex(50, 20), \$Ifirst, \$Ilast, 0.1, 3.0)
``````

The first version gives

``````19.555 ns (0 allocations: 0 bytes)
``````

The second version gives

``````28.444 ns (1 allocation: 16 bytes)
``````

I thought maybe it was the captured variables creating a βboxβ as described in the performance tips, so I made m() a separate function which accepts Ifirst, Ilast, I, and grid as arguments, but this proved to be even worse:

``````285.022 ns (18 allocations: 288 bytes)
``````

Whatβs the cause of this? When can I not trust functions to handle small tasks within my code without worsening performance?

The problem is that the last line in your function `m`:

``````Umin = min(Ufirst, Ulast)
``````

Clashes with the outer variable in `compute_distance` (same variable name `Umin`):

``````Umin = min(Uv, Uh)
``````

If you rename any of these variables, or simple replace the first expression with:

``````return min(Ufirst, Ulast)
``````

The allocation will go away, and the two implementations should perform equally.

Btw, you can use `@code_warntype` to detect this type of problem:

``````julia> @code_warntype compute_distance2((grid), CartesianIndex(50, 20), Ifirst, Ilast, 0.1, 3.0)
Body::Any
1 ββ %1   = %new(Core.Box)::Core.Box
...
``````

The return type being `Any` (colored red) and the presence of `Core.Box` indicates that thereβs a problem. You can use a more detailed version of `@code_warntype` to get a better idea of where the problem occurs (look for the first red `Any`):

``````julia> @code_warntype debuginfo=:source compute_distance2((grid), CartesianIndex(50, 20), Ifirst, Ilast, 0.1, 3.0)
...
β    %163 = (Core.isdefined)(%1, :contents)::Bool
ββββ        goto #26 if not %163
25 β        goto #27
26 β        \$(Expr(:throw_undef_if_not, :Umin, false))
27 β %167 = (Core.getfield)(%1, :contents)::Any
β    %168 = (h + %167)::Any
``````

Itβs perhaps not super clear, but line 49 corresponds to `return min(h+Umin, maxdist)`, and we can see that `Core.isdefined` is called for the field `Umin` which indicates that the compiler is unable to fully deduce what that field is. You might find the output of `@code_lowered` easier to read.

As for why this happens, it has to do with how scoping works in Julia. If the variable is declared anywhere in the outer scope, and also in an inner scope, it will be available anywhere after the declaration in the inner scope. Not sure if that explanation made sense, but consider these examples:

``````function test()
for i = 1:10
k = i
end
println(k)
end

julia> test();
ERROR: UndefVarError: k not defined
``````

And now:

``````function test()
for i = 1:10
k = i
end
println(k)
k = 2
end

julia> test();
10
``````
3 Likes

To me, this second example represents a really weird scoping behavior, I thought this should throw an error too because you requested to print the value of `k` before declaring `k = 2` out of the loop scope. I thought `k` must be declared before entering the loop to be able to collect its value outside of the loop!