10x slowdown when passing function as argument

I’m trying to understand why passing a function as an argument is sometimes 10x slower than inheriting it via closure. Here’s a contrived MWE:

function outer1(f)
    out = zeros(1000)
    for i = 1:1000
        function inner(x)
            return f(2*x)
        end
        out[i] = inner(i)
    end
end

function outer2(f)
    out = zeros(1000)
    for i = 1:1000
        function inner(x,f)
            return f(2*x)
        end
        out[i] = inner(i,f)
    end
end

f(x) = exp(x)

@btime outer1($f)
@btime outer2($f)

The output is

  4.399 μs (1 allocation: 7.94 KiB)
  42.277 μs (1979 allocations: 38.84 KiB)

Where are all the extra allocations (which I assume are the reason for the slowdown) coming from?

The problem is that functions are automatically not specialized. You want function inner(x,f::F) where F Also, I would definitely recommend not defining the function in a loop.

I think I actually want outer2(f::F) where F - this fixes the issue, and the runtimes are about equal now.

For this MWE, I don’t see any difference in performance based on where I define the function. Why is it not recommended in general?

The function is independent of the loop iteration.

Yes, I realize it’s unnecessary to define it in the loop, but I’d like to understand specifically why it’s bad. e.g.,

  • is it because it’s ugly or hard to read?
  • is it b/c there are possible performance losses?

I’m not trying to argue that it’s good practice, but it would help me understand Julia better if I understood the consequences of doing dumb stuff like defining functions in loops.

I understand what you’re saying and I think it’s a good question to have answered. The 2 cents I gave was the only reason I could sufficiently articulate. I’ll leave it to those smarter than I to answer your latest inquiry.

Turns out that (yet again) the answer was lying in plain sight in the Performance Tips.

Julia avoids automatically specializing on argument type parameters in three specific cases: Type , Function , and Vararg . Julia will always specialize when the argument is used within the method, but not if the argument is just passed through to another function.

[…] If you find it does have a performance impact at runtime in your case, you can trigger specialization by adding a type parameter to the method declaration.

Not that anyone asked me, but I think that these rules regarding specialisation of function arguments are absolutely horrible. It’s just too unintuitive why outer1() would be any different from outer2(). I assume the implicit assumption is that if fun(f::Function) merely passes f, then most likely it is a short function which is inlined and hence the lack of specialisation disappears?

Edit: Obviously the above remark is not intended to insult anyone or to claim that I knew how to do it better.

It does feel a little opaque to me, like I’m guessing at Julia’s behavior when it comes to passing functions. I was able to speed up my actual code by adding type specification, but it slowed back down 10x after I moved the functions out of the for loop in which they were defined :roll_eyes:.

Trying to reproduce it in a MWE again…

I can’t seem to reproduce the error where performance changed after moving the functions out of the loop, but I’m still having understanding function passing performance.

I can fix the slowdown in the original post by type specifying (I don’t need to type specify inner(…) but doing so doesn’t reduce performance).

function outer2(f::Fxn) where Fxn
    function inner(x,f::Fxn) where Fxn
        return f(2*x)
    end
    out = zeros(1000)
    for i = 1:1000
        out[i] = inner(i,f)
    end
end

However, if I pass the function thru twice, I can’t seem to avoid slowdown via type specification.

function outer3(f::Fxn) where Fxn
    function pass(x,f::Fxn) where Fxn
        return inner(x,f)
    end
    function inner(x,f::Fxn) where Fxn
        return f(2*x)
    end

    out = zeros(1000)
    for i = 1:1000
        out[i] = pass(i,f)
    end
end

f(x) = exp(x)

Timing both gives

  4.226 μs (1 allocation: 7.94 KiB) # outer2
  50.668 μs (3979 allocations: 70.09 KiB) # outer3

Any suggestions?

Not to define pass and inner inside outer seems to help:

julia> function pass(x,f::Fxn) where Fxn
           return inner(x,f)
       end
       function inner(x,f::Fxn) where Fxn
           return f(2*x)
       end

       function outer4(f::Fxn) where Fxn
           out = zeros(1000)
           for i = 1:1000
               out[i] = pass(i,f)
           end
       end

       @btime outer4(exp)
  5.052 μs (1 allocation: 7.94 KiB)

If you are worried about “leaking” inner and pass into the surrounding scope, you could move them into a module outer_utils.

You have to anotate only the outermost function, and define functions before they are called:

function outer3(f::Fxn) where {Fxn}
    function inner(x,f)
        return f(2*x)
    end
    function pass(x,f)
        return inner(x,f)
    end
    out = zeros(1000)
    for i = 1:1000
        out[i] = pass(i,f)
    end
    out 
end

using BenchmarkTools
f(x) = exp(x)
@btime outer3(f)
  4.967 μs (1 allocation: 7.94 KiB)

Am I missing something or does this not make any sense at all?

Yeah, I’m having trouble seeing a pattern or rule.

Functions passed functions as arguments will automatically specialize if they call said function.
That is

inner(f,x) = f(x) # sort function args first to support `do` syntax

should specialize because it is calling f, but

pass(f,x) = inner(f,x)

won’t necessarilly, because it isn’t calling f.

Let’s use @noinline to try and make this more clear.

function outer000(f) # 0 v 1 bools on specialization for outer, pass, inner
    @noinline function inner(f,x)
        return f(2*x)
    end
    @noinline function pass(f,x)
        return inner(f,x)
    end
    out = zeros(1000)
    for i = 1:1000
        out[i] = pass(f,i)
    end
    out 
end
function outer001(f) # 0 v 1 bools on specialization for outer, pass, inner
    @noinline function inner(f::Fi,x) where {Fi}
        return f(2*x)
    end
    @noinline function pass(f,x)
        return inner(f,x)
    end
    out = zeros(1000)
    for i = 1:1000
        out[i] = pass(f,i)
    end
    out 
end
function outer010(f) # 0 v 1 bools on specialization for outer, pass, inner
    @noinline function inner(f,x)
        return f(2*x)
    end
    @noinline function pass(f::Fp,x) where {Fp}
        return inner(f,x)
    end
    out = zeros(1000)
    for i = 1:1000
        out[i] = pass(f,i)
    end
    out 
end
function outer011(f) # 0 v 1 bools on specialization for outer, pass, inner
    @noinline function inner(f::Fi,x) where {Fi}
        return f(2*x)
    end
    @noinline function pass(f::Fp,x) where {Fp}
        return inner(f,x)
    end
    out = zeros(1000)
    for i = 1:1000
        out[i] = pass(f,i)
    end
    out 
end
function outer100(f::Fo) where {Fo} # 0 v 1 bools on specialization for outer, pass, inner
    @noinline function inner(f,x)
        return f(2*x)
    end
    @noinline function pass(f,x)
        return inner(f,x)
    end
    out = zeros(1000)
    for i = 1:1000
        out[i] = pass(f,i)
    end
    out 
end
function outer101(f::Fo) where {Fo} # 0 v 1 bools on specialization for outer, pass, inner
    @noinline function inner(f::Fi,x) where {Fi}
        return f(2*x)
    end
    @noinline function pass(f,x)
        return inner(f,x)
    end
    out = zeros(1000)
    for i = 1:1000
        out[i] = pass(f,i)
    end
    out 
end
function outer110(f::Fo) where {Fo} # 0 v 1 bools on specialization for outer, pass, inner
    @noinline function inner(f,x)
        return f(2*x)
    end
    @noinline function pass(f::Fp,x) where {Fp}
        return inner(f,x)
    end
    out = zeros(1000)
    for i = 1:1000
        out[i] = pass(f,i)
    end
    out 
end
function outer111(f::Fo) where {Fo} # 0 v 1 bools on specialization for outer, pass, inner
    @noinline function inner(f::Fi,x) where {Fi}
        return f(2*x)
    end
    @noinline function pass(f::Fp,x) where {Fp}
        return inner(f,x)
    end
    out = zeros(1000)
    for i = 1:1000
        out[i] = pass(f,i)
    end
    out 
end

This yields:

f(x) = exp(x)
julia> @btime outer000(f);
  46.415 μs (1979 allocations: 38.84 KiB)

julia> @btime outer001(f);
  47.216 μs (1979 allocations: 38.84 KiB)

julia> @btime outer010(f);
  46.122 μs (1979 allocations: 38.84 KiB)

julia> @btime outer011(f);
  45.612 μs (1979 allocations: 38.84 KiB)

julia> @btime outer100(f);
  37.679 μs (1979 allocations: 38.84 KiB)

julia> @btime outer101(f);
  38.109 μs (1979 allocations: 38.84 KiB)

julia> @btime outer110(f);
  12.724 μs (1 allocation: 7.94 KiB)

julia> @btime outer111(f);
  12.402 μs (1 allocation: 7.94 KiB)

It’s the two functions not calling f that matter – they must either inline or specialize.

Seems like Julia’s function passing behavior is tricky to catch and profile - the specialization solution is the same as this solution proposed in 2018, and folks in the thread noted that diagnosing this with @code_warntype can be tricky.

If these functions behave this way, what about for functions inherited by closure? I realized a lot of my code optimizations over the last week had to do with calling closure-inherited functions in a specific way to ensure specialization.