10x slowdown when passing function as argument

I’m trying to understand why passing a function as an argument is sometimes 10x slower than inheriting it via closure. Here’s a contrived MWE:

function outer1(f)
    out = zeros(1000)
    for i = 1:1000
        function inner(x)
            return f(2*x)
        end
        out[i] = inner(i)
    end
end

function outer2(f)
    out = zeros(1000)
    for i = 1:1000
        function inner(x,f)
            return f(2*x)
        end
        out[i] = inner(i,f)
    end
end

f(x) = exp(x)

@btime outer1($f)
@btime outer2($f)

The output is

  4.399 μs (1 allocation: 7.94 KiB)
  42.277 μs (1979 allocations: 38.84 KiB)

Where are all the extra allocations (which I assume are the reason for the slowdown) coming from?

The problem is that functions are automatically not specialized. You want function inner(x,f::F) where F Also, I would definitely recommend not defining the function in a loop.

3 Likes

I think I actually want outer2(f::F) where F - this fixes the issue, and the runtimes are about equal now.

For this MWE, I don’t see any difference in performance based on where I define the function. Why is it not recommended in general?

The function is independent of the loop iteration.

4 Likes

Yes, I realize it’s unnecessary to define it in the loop, but I’d like to understand specifically why it’s bad. e.g.,

  • is it because it’s ugly or hard to read?
  • is it b/c there are possible performance losses?

I’m not trying to argue that it’s good practice, but it would help me understand Julia better if I understood the consequences of doing dumb stuff like defining functions in loops.

I understand what you’re saying and I think it’s a good question to have answered. The 2 cents I gave was the only reason I could sufficiently articulate. I’ll leave it to those smarter than I to answer your latest inquiry.

2 Likes

Turns out that (yet again) the answer was lying in plain sight in the Performance Tips.

Julia avoids automatically specializing on argument type parameters in three specific cases: Type , Function , and Vararg . Julia will always specialize when the argument is used within the method, but not if the argument is just passed through to another function.

[…] If you find it does have a performance impact at runtime in your case, you can trigger specialization by adding a type parameter to the method declaration.

6 Likes

Not that anyone asked me, but I think that these rules regarding specialisation of function arguments are absolutely horrible. It’s just too unintuitive why outer1() would be any different from outer2(). I assume the implicit assumption is that if fun(f::Function) merely passes f, then most likely it is a short function which is inlined and hence the lack of specialisation disappears?

Edit: Obviously the above remark is not intended to insult anyone or to claim that I knew how to do it better.

It does feel a little opaque to me, like I’m guessing at Julia’s behavior when it comes to passing functions. I was able to speed up my actual code by adding type specification, but it slowed back down 10x after I moved the functions out of the for loop in which they were defined :roll_eyes:.

Trying to reproduce it in a MWE again…

I can’t seem to reproduce the error where performance changed after moving the functions out of the loop, but I’m still having understanding function passing performance.

I can fix the slowdown in the original post by type specifying (I don’t need to type specify inner(…) but doing so doesn’t reduce performance).

function outer2(f::Fxn) where Fxn
    function inner(x,f::Fxn) where Fxn
        return f(2*x)
    end
    out = zeros(1000)
    for i = 1:1000
        out[i] = inner(i,f)
    end
end

However, if I pass the function thru twice, I can’t seem to avoid slowdown via type specification.

function outer3(f::Fxn) where Fxn
    function pass(x,f::Fxn) where Fxn
        return inner(x,f)
    end
    function inner(x,f::Fxn) where Fxn
        return f(2*x)
    end

    out = zeros(1000)
    for i = 1:1000
        out[i] = pass(i,f)
    end
end

f(x) = exp(x)

Timing both gives

  4.226 μs (1 allocation: 7.94 KiB) # outer2
  50.668 μs (3979 allocations: 70.09 KiB) # outer3

Any suggestions?

Not to define pass and inner inside outer seems to help:

julia> function pass(x,f::Fxn) where Fxn
           return inner(x,f)
       end
       function inner(x,f::Fxn) where Fxn
           return f(2*x)
       end

       function outer4(f::Fxn) where Fxn
           out = zeros(1000)
           for i = 1:1000
               out[i] = pass(i,f)
           end
       end

       @btime outer4(exp)
  5.052 μs (1 allocation: 7.94 KiB)

If you are worried about “leaking” inner and pass into the surrounding scope, you could move them into a module outer_utils.

1 Like

You have to anotate only the outermost function, and define functions before they are called:

function outer3(f::Fxn) where {Fxn}
    function inner(x,f)
        return f(2*x)
    end
    function pass(x,f)
        return inner(x,f)
    end
    out = zeros(1000)
    for i = 1:1000
        out[i] = pass(i,f)
    end
    out 
end

using BenchmarkTools
f(x) = exp(x)
@btime outer3(f)
  4.967 μs (1 allocation: 7.94 KiB)
1 Like

Am I missing something or does this not make any sense at all?

Yeah, I’m having trouble seeing a pattern or rule.

Functions passed functions as arguments will automatically specialize if they call said function.
That is

inner(f,x) = f(x) # sort function args first to support `do` syntax

should specialize because it is calling f, but

pass(f,x) = inner(f,x)

won’t necessarilly, because it isn’t calling f.

Let’s use @noinline to try and make this more clear.

function outer000(f) # 0 v 1 bools on specialization for outer, pass, inner
    @noinline function inner(f,x)
        return f(2*x)
    end
    @noinline function pass(f,x)
        return inner(f,x)
    end
    out = zeros(1000)
    for i = 1:1000
        out[i] = pass(f,i)
    end
    out 
end
function outer001(f) # 0 v 1 bools on specialization for outer, pass, inner
    @noinline function inner(f::Fi,x) where {Fi}
        return f(2*x)
    end
    @noinline function pass(f,x)
        return inner(f,x)
    end
    out = zeros(1000)
    for i = 1:1000
        out[i] = pass(f,i)
    end
    out 
end
function outer010(f) # 0 v 1 bools on specialization for outer, pass, inner
    @noinline function inner(f,x)
        return f(2*x)
    end
    @noinline function pass(f::Fp,x) where {Fp}
        return inner(f,x)
    end
    out = zeros(1000)
    for i = 1:1000
        out[i] = pass(f,i)
    end
    out 
end
function outer011(f) # 0 v 1 bools on specialization for outer, pass, inner
    @noinline function inner(f::Fi,x) where {Fi}
        return f(2*x)
    end
    @noinline function pass(f::Fp,x) where {Fp}
        return inner(f,x)
    end
    out = zeros(1000)
    for i = 1:1000
        out[i] = pass(f,i)
    end
    out 
end
function outer100(f::Fo) where {Fo} # 0 v 1 bools on specialization for outer, pass, inner
    @noinline function inner(f,x)
        return f(2*x)
    end
    @noinline function pass(f,x)
        return inner(f,x)
    end
    out = zeros(1000)
    for i = 1:1000
        out[i] = pass(f,i)
    end
    out 
end
function outer101(f::Fo) where {Fo} # 0 v 1 bools on specialization for outer, pass, inner
    @noinline function inner(f::Fi,x) where {Fi}
        return f(2*x)
    end
    @noinline function pass(f,x)
        return inner(f,x)
    end
    out = zeros(1000)
    for i = 1:1000
        out[i] = pass(f,i)
    end
    out 
end
function outer110(f::Fo) where {Fo} # 0 v 1 bools on specialization for outer, pass, inner
    @noinline function inner(f,x)
        return f(2*x)
    end
    @noinline function pass(f::Fp,x) where {Fp}
        return inner(f,x)
    end
    out = zeros(1000)
    for i = 1:1000
        out[i] = pass(f,i)
    end
    out 
end
function outer111(f::Fo) where {Fo} # 0 v 1 bools on specialization for outer, pass, inner
    @noinline function inner(f::Fi,x) where {Fi}
        return f(2*x)
    end
    @noinline function pass(f::Fp,x) where {Fp}
        return inner(f,x)
    end
    out = zeros(1000)
    for i = 1:1000
        out[i] = pass(f,i)
    end
    out 
end

This yields:

f(x) = exp(x)
julia> @btime outer000(f);
  46.415 μs (1979 allocations: 38.84 KiB)

julia> @btime outer001(f);
  47.216 μs (1979 allocations: 38.84 KiB)

julia> @btime outer010(f);
  46.122 μs (1979 allocations: 38.84 KiB)

julia> @btime outer011(f);
  45.612 μs (1979 allocations: 38.84 KiB)

julia> @btime outer100(f);
  37.679 μs (1979 allocations: 38.84 KiB)

julia> @btime outer101(f);
  38.109 μs (1979 allocations: 38.84 KiB)

julia> @btime outer110(f);
  12.724 μs (1 allocation: 7.94 KiB)

julia> @btime outer111(f);
  12.402 μs (1 allocation: 7.94 KiB)

It’s the two functions not calling f that matter – they must either inline or specialize.

7 Likes

Seems like Julia’s function passing behavior is tricky to catch and profile - the specialization solution is the same as this solution proposed in 2018, and folks in the thread noted that diagnosing this with @code_warntype can be tricky.

If these functions behave this way, what about for functions inherited by closure? I realized a lot of my code optimizations over the last week had to do with calling closure-inherited functions in a specific way to ensure specialization.