Why is this closure over keyword arguments causing such a dramatic slowdown?

I’m writing a function that allows a user to freeze some variables of a function (needs to be generic with respect to the names and number of keyword arguments). Design-wise, I now have exactly what I need and the code is nice and neat, but I am really surprised at the overhead I am seeing.

Does anyone know why I am hitting an 8x slowdown and if it can be optimised?

Minimum Working Example below:

function L0(arg1; kwarg1, kwarg2, kwarg3, kwarg4)
    arg1 + kwarg1 + kwarg2 + kwarg3 + kwarg4
end

allkwargs = (kwarg1=1, kwarg2=2, kwarg3=3, kwarg4=4)
firstkwargs = (kwarg1=1, kwarg2=2)
lastkwargs = (kwarg3=3, kwarg4=4)

L1(arg1; kwargs...) = L0(arg1; firstkwargs..., kwargs...)

@btime L0(0; $allkwargs...)
@btime L1(0; $lastkwargs...)

Which returns:

1.205 ns (0 allocations: 0 bytes)
87.518 ns (5 allocations: 144 bytes)

in this line

L1(arg1; kwargs...) = L0(arg1; firstkwargs..., kwargs...)

firstkwargs is a global variable. Number one performance tip: avoid non-constant global varaibles https://docs.julialang.org/en/v1/manual/performance-tips/index.html

function L0(arg1; kwarg1, kwarg2, kwarg3, kwarg4)
    arg1 + kwarg1 + kwarg2 + kwarg3 + kwarg4
end

const allkwargs = (kwarg1=1, kwarg2=2, kwarg3=3, kwarg4=4)
const firstkwargs = (kwarg1=1, kwarg2=2)
const lastkwargs = (kwarg3=3, kwarg4=4)

L1(arg1; kwargs...) = L0(arg1; firstkwargs..., kwargs...)

@btime L0(0; $allkwargs...) # 1.422 ns (0 allocations: 0 bytes)
@btime L1(0; $lastkwargs...) # 1.421 ns (0 allocations: 0 bytes)
3 Likes

@Mason Thanks that makes sense, I was actually doing the ‘freezing’ in a function so I thought I had avoided the global scope but it appears not. Here is a slightly embellished minimum working example:

function L0(arg1; kwarg1, kwarg2, kwarg3, kwarg4)
    arg1 + kwarg1 + kwarg2 + kwarg3 + kwarg4
end

function freeze_params(func_to_freeze, params_to_freeze)
    return (arg1; kwargs...) -> func_to_freeze(arg1; kwargs..., params_to_freeze...)
end

allkwargs = (kwarg1=1, kwarg2=2, kwarg3=3, kwarg4=4)
const firstkwargs = (kwarg1=1, kwarg2=2)
lastkwargs = (kwarg3=3, kwarg4=4)

L1(arg1; kwargs...) = freeze_params(L0, firstkwargs)

@btime L0(0; $allkwargs...)
@btime L1(0; $lastkwargs...)

And indeed, exactly as you have said, making firstkwargs const has solved the issue. I also tested just inserting the literal value of firstkwargs into the freeze_params function call and that has a similar effect.

Without const:

  1.506 ns (0 allocations: 0 bytes)
  242.597 ns (2 allocations: 64 bytes)

With const before firstkwargs:

  1.506 ns (0 allocations: 0 bytes)
  0.001 ns (0 allocations: 0 bytes)

In the above MWE would there be any way to force freeze_params to use a local version of params_to_freeze and thus to avoid having to force a const prefix in the code above?

You could do this:

function L0(arg1; kwarg1, kwarg2, kwarg3, kwarg4)
    arg1 + kwarg1 + kwarg2 + kwarg3 + kwarg4
end

allkwargs = (kwarg1=1, kwarg2=2, kwarg3=3, kwarg4=4)
firstkwargs = (kwarg1=1, kwarg2=2)
lastkwargs = (kwarg3=3, kwarg4=4)

const L1 = let firstkwargs = firstkwargs
    (arg1; kwargs...) -> L0(arg1; firstkwargs..., kwargs...)
end

@btime L0(0; $allkwargs...)  # 1.421 ns (0 allocations: 0 bytes)
@btime L1(0; $lastkwargs...) # 1.420 ns (0 allocations: 0 bytes)

here, instead of defining a regular function L1 I bound the const L1 to the anonymous function (arg1; kwargs...) -> L0(arg1; firstkwargs..., kwargs...). The way I used the let block there ensures that firstkwargs is not a global in the scope of the anonymous function.

This has some downsides though, like this:

julia> L1
 #14 (generic function with 1 method)

and you can’t add methods to L1 in the regular way you might expect.

1 Like

Are you sure?

julia> begin
       function L0(arg1; kwarg1, kwarg2, kwarg3, kwarg4)
           arg1 + kwarg1 + kwarg2 + kwarg3 + kwarg4
       end
       
       allkwargs = (kwarg1=1, kwarg2=2, kwarg3=3, kwarg4=4)
       firstkwargs = (kwarg1=1, kwarg2=2)
       lastkwargs = (kwarg3=3, kwarg4=4)
       
       const L1 = let firstkwargs = firstkwargs
           (arg1; kwargs...) -> L0(arg1; firstkwargs..., kwargs...)
       end
       end
#8 (generic function with 1 method)

julia> L1(::Float64) = "hi"
ERROR: cannot define function L1; it already has a value
Stacktrace:
 [1] top-level scope at none:0
 [2] top-level scope at REPL[2]:1
1 Like

Thanks I think I am with you. Following up on that, what do you mean by this?

I tried with one value as a Float64 for example (which is the most exotic example these functions are likely to face) and it seemed OK.

Regarding changes in firstkwargs not propagating, that is no problem at all for the software requirements.

My last challenge then is this, how do I capture your const and let block logic within the above freeze_params function. It seems I was caught out by not considering the passing by sharing in Julia functions. One way I am wondering is if I can just copy the variable within the function so it forces local.

Following up on that with a test, deepcopy doesn’t seem to help me here:

function L0(arg1; kwarg1, kwarg2, kwarg3, kwarg4)
    arg1 + kwarg1 + kwarg2 + kwarg3 + kwarg4
end

function freeze_params(func_to_freeze, params_to_freeze)
    local_params_to_freeze = deepcopy(params_to_freeze)
    return (arg1; kwargs...) -> func_to_freeze(arg1; kwargs..., local_params_to_freeze...)
end

allkwargs = (kwarg1=1, kwarg2=2, kwarg3=3, kwarg4=4)
firstkwargs = (kwarg1=1, kwarg2=2)
lastkwargs = (kwarg3=3, kwarg4=4)

L1(arg1; kwargs...) = freeze_params(L0, firstkwargs)

@btime L0(0; $allkwargs...) # 1.205 ns (0 allocations: 0 bytes)
@btime L1(0; $lastkwargs...) # 73.484 ns (3 allocations: 400 bytes)

The problem with your freeze_params is that it’s inside the L0 function, not outside, so it just makes L0 carry around the global from freeze_params. You can do this though:

function L0(arg1; kwarg1, kwarg2, kwarg3, kwarg4)
    arg1 + kwarg1 + kwarg2 + kwarg3 + kwarg4
end

function freeze_params(func_to_freeze, params_to_freeze)
    (arg1; kwargs...) -> func_to_freeze(arg1; kwargs..., params_to_freeze...)
end

allkwargs = (kwarg1=1, kwarg2=2, kwarg3=3, kwarg4=4)
firstkwargs = (kwarg1=1, kwarg2=2)
lastkwargs = (kwarg3=3, kwarg4=4)

const L1 = freeze_params(L0, firstkwargs)
1 Like

Another option if you want L1 to be a regular function which can have methods added to it is to use @eval:

function L0(arg1; kwarg1, kwarg2, kwarg3, kwarg4)
    arg1 + kwarg1 + kwarg2 + kwarg3 + kwarg4
end

allkwargs = (kwarg1=1, kwarg2=2, kwarg3=3, kwarg4=4)
firstkwargs = (kwarg1=1, kwarg2=2)
lastkwargs = (kwarg3=3, kwarg4=4)

@eval L1(arg1; kwargs...) = L0(arg1; $(firstkwargs)..., kwargs...)

This will be just as fast as the const let form but can have methods added to it.

Using eval like this is a bad idea if you’re using it in a local scope though as your definition will ‘leak’:

julia> let
           function L0(arg1; kwarg1, kwarg2, kwarg3, kwarg4)
               arg1 + kwarg1 + kwarg2 + kwarg3 + kwarg4
           end
       
           allkwargs = (kwarg1=1, kwarg2=2, kwarg3=3, kwarg4=4)
           firstkwargs = (kwarg1=1, kwarg2=2)
           lastkwargs = (kwarg3=3, kwarg4=4)
       
           #@generated L1(arg1; kwargs...) = :(L0(arg1; $(firstkwargs)..., kwargs...))
           @eval L1(arg1; kwargs...) = L0(arg1; $(firstkwargs)..., kwargs...)
       end
L1 (generic function with 1 method)

julia> L1
L1 (generic function with 1 method)

so if these functions might get created in a local scope instead of global scope, you can use a @generated function instead:

@generated L1(arg1; kwargs...) = :(L0(arg1; $(firstkwargs)..., kwargs...))

which does not leak into the global scope:

julia> let
           function L0(arg1; kwarg1, kwarg2, kwarg3, kwarg4)
               arg1 + kwarg1 + kwarg2 + kwarg3 + kwarg4
           end
       
           allkwargs = (kwarg1=1, kwarg2=2, kwarg3=3, kwarg4=4)
           firstkwargs = (kwarg1=1, kwarg2=2)
           lastkwargs = (kwarg3=3, kwarg4=4)
       
           @generated L1(arg1; kwargs...) = :(L0(arg1; $(firstkwargs)..., kwargs...))
       end
(::var"#L1#12"{var"#L1#8#13"}) (generic function with 1 method)

julia> L1
ERROR: UndefVarError: L1 not defined
1 Like

I think so but I’m probably misunderstanding or making an error, when I run:

function L0(arg1; kwarg1, kwarg2, kwarg3, kwarg4)
    arg1 + kwarg1 + kwarg2 + kwarg3 + kwarg4
end

allkwargs = (kwarg1=1, kwarg2=2, kwarg3=3, kwarg4=4)
firstkwargs = (kwarg1=1, kwarg2=2)
lastkwargs = (kwarg3=3, kwarg4=4)

const L1 = let firstkwargs = firstkwargs
    (arg1; kwargs...) -> L0(arg1; firstkwargs..., kwargs...)
end

@btime L0(0; $allkwargs...)  # 1.421 ns (0 allocations: 0 bytes)
@btime L1(0; $lastkwargs...) # 1.420 ns (0 allocations: 0 bytes)

Then I can do:

julia> L1(0; lastkwargs...)
10

julia> L1(0.0; lastkwargs...)
10.0

L1 is still a generic function that can take in arguments of different types, but you can’t use multiple dispatch in the normal way to add different methods that do different things. Your example is just duck typing.

My example:

julia> begin
       function L0(arg1; kwarg1, kwarg2, kwarg3, kwarg4)
           arg1 + kwarg1 + kwarg2 + kwarg3 + kwarg4
       end
       
       allkwargs = (kwarg1=1, kwarg2=2, kwarg3=3, kwarg4=4)
       firstkwargs = (kwarg1=1, kwarg2=2)
       lastkwargs = (kwarg3=3, kwarg4=4)
       
       const L1 = let firstkwargs = firstkwargs
           (arg1; kwargs...) -> L0(arg1; firstkwargs..., kwargs...)
       end
       end
#8 (generic function with 1 method)

julia> L1(::Float64) = "hi"
ERROR: cannot define function L1; it already has a value
Stacktrace:
 [1] top-level scope at none:0
 [2] top-level scope at REPL[2]:1

is multiple dispatch

1 Like

In case it was unclear, I should also note that a @generated function also works in the global scope just fine.

1 Like

@Mason thanks so much for all of the above, it is extremely helpful. I think I am starting to understand (particularly on duck typing), though I’ve always had difficulty understanding the @generated macro.

I’m trying to put it all together and hit an error message unfortunately:

function L0(arg1; kwarg1, kwarg2, kwarg3, kwarg4)
    arg1 + kwarg1 + kwarg2 + kwarg3 + kwarg4
end

function freeze_params(func_to_freeze, params_to_freeze)
    @generated out(arg1; kwargs...) = :(func_to_freeze(arg1; $(params_to_freeze)..., kwargs...))
end

allkwargs = (kwarg1=1, kwarg2=2, kwarg3=3, kwarg4=4)
firstkwargs = (kwarg1=1, kwarg2=2)
lastkwargs = (kwarg3=3, kwarg4=4)

L1 = freeze_params(L0, firstkwargs)

@btime L0(0; $allkwargs...) 
@btime L1(0; $lastkwargs...)

The error is:
ERROR: LoadError: syntax: Global method definition around C:\Users\J\Documents\Work\20200306_FWinvestigate.jl:771 needs to be placed at the top level, or use "eval".

You’re still doing things in the wrong order and using non-constant globals.

The error you got is because you are trying to create a generated function that relies on data that doesn’t exist yet. In

function freeze_params(func_to_freeze, params_to_freeze)
    @generated out(arg1; kwargs...) = :(func_to_freeze(arg1; $(params_to_freeze)..., kwargs...))
end

params_to_freeze doesn’t have a value until the freeze_params function is executed, but the @generated macro runs at parse time which is before freeze_params is ever run.

1 Like

Thanks that makes sense

You can do this:

function L0(arg1; kwarg1, kwarg2, kwarg3, kwarg4)
    arg1 + kwarg1 + kwarg2 + kwarg3 + kwarg4
end

function freeze_params(func_to_freeze, params_to_freeze)
    (args...; kwargs...) -> func_to_freeze(args...; params_to_freeze..., kwargs...)
end

allkwargs = (kwarg1=1, kwarg2=2, kwarg3=3, kwarg4=4)
firstkwargs = (kwarg1=1, kwarg2=2)
lastkwargs = (kwarg3=3, kwarg4=4)

@generated L1(args...; kwargs...) = :($(freeze_params(L0, firstkwargs))(args...; kwargs...))

@btime L0(0; $allkwargs...) 
@btime L1(0; $lastkwargs...)

or instead have

@eval L1(args...; kwargs...) = $(freeze_params(L0, firstkwargs))(args...; kwargs...)

or

const L1 = freeze_params(L0, firstkwargs)

const let blocks, @generated functions and @eval all have their own strengths and weaknesses so which one is preferable depends on your situation.

1 Like

Thanks @Mason, I did some reading after this. It seems the let approach is most flexible so I am trying to do that. It seems this is the right way forward for me but I think I’m doing something wrong as it is not giving the speedup expected. (https://docs.julialang.org/en/v1.4/manual/performance-tips/#man-performance-captured-1)

function L0(arg1; kwarg1, kwarg2, kwarg3, kwarg4)
    arg1 + kwarg1 + kwarg2 + kwarg3 + kwarg4
end

function freezeparams(func_to_freeze, params_to_freeze)
    (arg1; kwargz...) -> func_to_freeze(arg1; kwargz..., params_to_freeze...)
end

function freezeparams_let(func_to_freeze, params_to_freeze)
    f = let params_to_freeze = params_to_freeze
        (arg1; kwargs...) -> func_to_freeze(arg1; kwargs..., params_to_freeze...)
    end
    f
end

allkwargs = (kwarg1=1, kwarg2=2, kwarg3=3, kwarg4=4)
firstkwargs = (kwarg1=1, kwarg2=2)
lastkwargs = (kwarg3=3, kwarg4=4)

L1_FP_WITHARG_GLOBAL(arg1; kwargs...) = freezeparams(L0, firstkwargs)
L1_FP_WITHARG_INTERP(arg1; kwargs...) = freezeparams(L0, (kwarg1=1, kwarg2=2))

L1_FP_NOARG_GLOBAL = freezeparams(L0, firstkwargs)
L1_FP_NOARG_INTERP = freezeparams(L0, (kwarg1=1, kwarg2=2))

L1_FPL_WITHARG_GLOBAL(arg1; kwargs...) = freezeparams_let(L0, firstkwargs)
L1_FPL_WITHARG_INTERP(arg1; kwargs...) = freezeparams_let(L0, (kwarg1=1, kwarg2=2))

L1_FPL_NOARG_GLOBAL = freezeparams_let(L0, firstkwargs)
L1_FPL_NOARG_INTERP = freezeparams_let(L0, (kwarg1=1, kwarg2=2))

println("\nReference case with no freezing")
@btime L0(0; $allkwargs...) # 1.205 ns (0 allocations: 0 bytes)

println("\nL1_FP_WITHARG_GLOBAL")
@btime L1_FP_WITHARG_GLOBAL(0; $lastkwargs...) # 271.585 ns (2 allocations: 64 bytes)

println("\nL1_FP_WITHARG_INTERP")
@btime L1_FP_WITHARG_INTERP(0; $lastkwargs...) # 1.506 ns (0 allocations: 0 bytes)

println("\nL1_FP_NOARG_GLOBAL")
@btime L1_FP_NOARG_GLOBAL(0; $lastkwargs...) # 43.997 ns (1 allocation: 32 bytes)

println("\nL1_FP_NOARG_INTERP")
@btime L1_FP_NOARG_INTERP(0; $lastkwargs...) # 43.092 ns (1 allocation: 32 bytes)

println("\nL1_FPL_WITHARG_GLOBAL")
@btime L1_FPL_WITHARG_GLOBAL(0; $lastkwargs...) # 269.493 ns (2 allocations: 64 bytes)

println("\nL1_FPL_WITHARG_INTERP")
@btime L1_FPL_WITHARG_INTERP(0; $lastkwargs...) # 1.506 ns (0 allocations: 0 bytes)

println("\nL1_FPL_NOARG_GLOBAL")
@btime L1_FPL_NOARG_GLOBAL(0; $lastkwargs...) # 42.490 ns (1 allocation: 32 bytes)

println("\nL1_FPL_NOARG_INTERP")
@btime L1_FPL_NOARG_INTERP(0; $lastkwargs...) # 42.490 ns (1 allocation: 32 bytes)

The most unexpected thing I noticed was that there seems to be a performance difference when assigning an annonymous function (output of freezeparams) to a variable (e.g. f) or a function (e.g. f(arg1; kwargs...)=) with the second one seemingly faster.

Regarding the capturing of the global variable, the let block appears not to make a difference (though i could reproduce the example in the Julia docs). The only thing improving the situation is directly splicing the parameters to freeze in the function call. (Which is an acceptable limitation of the software we’re writing.)

This let block doesn’t do anything:

function freezeparams_let(func_to_freeze, params_to_freeze)
    f = let params_to_freeze = params_to_freeze
        (arg1; kwargs...) -> func_to_freeze(arg1; kwargs..., params_to_freeze...)
    end
    f
end

The inside of the function is already a local scope. Your problems are happening in the global scope.

Everything on the left hand side of the equals signs here:

L1_FP_NOARG_GLOBAL = freezeparams(L0, firstkwargs)
L1_FP_NOARG_INTERP = freezeparams(L0, (kwarg1=1, kwarg2=2))

L1_FPL_WITHARG_GLOBAL(arg1; kwargs...) = freezeparams_let(L0, firstkwargs)
L1_FPL_WITHARG_INTERP(arg1; kwargs...) = freezeparams_let(L0, (kwarg1=1, kwarg2=2))

L1_FPL_NOARG_GLOBAL = freezeparams_let(L0, firstkwargs)
L1_FPL_NOARG_INTERP = freezeparams_let(L0, (kwarg1=1, kwarg2=2))

are non-constant global varaibles, so running L1_FP_NOARG_INTERP(0; lastkwargs...) for instance is going to be slow because the compiler has to check what L1_FP_NOARG_INTERP is at every step.

Why are you not using any of the techniques I suggested above? I feel a little frustrated as I put a fair amount of work into writing them out and explaining them but then you’re not using them.

1 Like

I may have misunderstood https://docs.julialang.org/en/v1.4/manual/performance-tips/#man-performance-captured-1 but it has the example:

function abmult3(r::Int)
    if r < 0
        r = -r
    end
    f = let r = r
            x -> x * r
    end
    return f
end

and says that “The let block creates a new variable r whose scope is only the inner function. The second technique recovers full language performance in the presence of captured variables.” (Beneath the final example of that section.)

Sorry you are frustrated. I certainly appreciate the help. I have tried to implement them but my actual use case is more complicated than the above examples and I have not been able to implement succesfully. Ideally freeze_params would return a struct containing four functions, all of which would be frozen in the way above. And the behaviour of freeze_params should be the same whether used in global scope or called by the user within a function (in the latter case we want to avoid any world-age issues). Also we are trying to avoid user-facing macros and const keywords if possible as a key audience for our software will be people who have no experience with Julia before, and perhaps very little programming experience in general. I am going to try again with your methods this evening and will report back with any progress.

Edit: also, for some reason it will not let me mark your first response (or any of them) as the solution, otherwise I would have done so. Apologies I’m not sure why that is the case.

Links to something closer to what we are trying to do:

Ah yes, but this is a separate issue, at least from the one in your minimal working example. The issue you’re having as far as I can tell is not that varaibles in your closures are boxed but that you’re depending on globals.

Just in case I came off a bit gruff, I’m not angry at you or anything and I am happy to help. I’m just sensing that not everything I’m saying is coming across and I’m having trouble figuring out how to communicate more clearly.

Can you explain why doing this does not solve your problem:

function L0(arg1; kwarg1, kwarg2, kwarg3, kwarg4)
    arg1 + kwarg1 + kwarg2 + kwarg3 + kwarg4
end

function freezeparams(func_to_freeze, params_to_freeze)
    (arg1; kwargz...) -> func_to_freeze(arg1; kwargz..., params_to_freeze...)
end

allkwargs = (kwarg1=1, kwarg2=2, kwarg3=3, kwarg4=4)
firstkwargs = (kwarg1=1, kwarg2=2)
lastkwargs = (kwarg3=3, kwarg4=4)

const L1 = freezeparams(L0, firstkwargs)

@btime L0(0; $allkwargs...)
@btime L1(0; $lastkwargs...)

i.e. putting const in front of L1 = ...?