What does it mean to define a function inside a function? Will this result in additional performance overhead?

function test_inner(x,y)
    x+y
end
function test_outer(x,y)
    return test_inner(x,y)
end
function test_outer2(x,y)
    function test_inner2(x,y)
        x+y
    end
    test_inner2(x,y)
end
x = 1.;y = 2.
@btime test_outer($x,$y)
@btime test_outer2($x,$y)

We have

  2.300 ns (0 allocations: 0 bytes)
  2.100 ns (0 allocations: 0 bytes)

Julia compiles your functions with a JIT. So no

I am a bit confused. The scope of the inner function definition seems to be limited to within the outer function. Does this mean that the inner function is redefined every time the outer function is executed? Why doesn’t this cause performance overhead?

Indeed, but a smart compiler can figure out that your test_inner2 never “escapes” the outer function, so it can just be compiled away.
This seems to be confirmed by the two functions generating the same lower-level (LLVM) code:

julia> @code_llvm test_outer(1., 2.)
; Function Signature: test_outer(Float64, Float64)
;  @ REPL[3]:1 within `test_outer`
define double @julia_test_outer_5117(double %"x::Float64", double %"y::Float64") #0 {
top:
;  @ REPL[3]:2 within `test_outer`
; ┌ @ REPL[2]:2 within `test_inner`
; │┌ @ float.jl:491 within `+`
    %0 = fadd double %"x::Float64", %"y::Float64"
; └└
  ret double %0
}

julia> @code_llvm test_outer2(1., 2.)
; Function Signature: test_outer2(Float64, Float64)
;  @ REPL[4]:1 within `test_outer2`
define double @julia_test_outer2_5121(double %"x::Float64", double %"y::Float64") #0 {
top:
;  @ REPL[4]:5 within `test_outer2`
; ┌ @ REPL[4]:3 within `test_inner2`
; │┌ @ float.jl:491 within `+`
    %0 = fadd double %"x::Float64", %"y::Float64"
    ret double %0
; └└
}

Therefore I’m not sure where the overhead comes from. It may just be a measurement artefact due to the short runtime of the function (in the nanosecond range).

Note however that in most cases, the inner function actually escapes your outer function because you want to return that inner function and reuse it later on. So these compiler optimizations are probably not representative of a realistic workflow.

For compilation to work well, there are a few prerequisites, the most important one being type-stability. In the case of inner functions, also called closures, it is a bit more tricky because they have to capture variables defined outside of their scope. In this simple case it works fine, but in general this capture can lead to subtle performance pitfalls. See this section of the manual to learn more.

4 Likes

Wow! Thank you for the detailed explanation. It’s really helpful. :smiley:

1 Like

Actually no, this is one of the ways Julia is less dynamic. To vastly oversimplify, Julia’s call-wise compilation is optimized by defining the inner function (and its possible multiple methods) “in advance” when the outer function is defined. This is why conditional method definitions do not work well in function bodies and are discouraged. You could emulate an actual method definition in the global scope at runtime with eval, and you run into world-age issues if you try to call that method afterward, before returning to the global scope (basically the outer function was compiled to call the obsolete method before you redefined it at runtime), or you deal with the less optimizable invokelatest.

Currently, inner functions aka closures get a function type in the global scope like globally scoped functions do, only it’s hidden from sight. Each function type is associated with the function’s method table. At runtime, the places where you wrote the inner method definitions actually just instantiate the function type. None of your inner functions captured variables from the outer functions (locals usually disappear after the method returns, so returning an inner function needs to capture those variables for themselves), so they’ll work like globally scoped functions.

When you do capture variables, those inner functions can have multiple values, which share the function’s type. If those variables are never reassigned, then it’s pretty optimizable. The performance pitfalls come usually when they are reassigned, it’s fundamentally hard for call-wise compilation to optimize that, and the linked section of the manual gives a few workarounds for some cases.

4 Likes

Very clear. Thanks a lot!