Understanding allocations when collapsing function arugments

Hi everyone,

I was hoping I could be helped in understanding the following concept. The code is a bit too long to provide a clean MWE, but I am trying to understanding the following:

julia> @btime integrand_B1_new($y,$params,$ygrid,$W_grid_F1_inv,$result_init)
  808.978 ns (0 allocations: 0 bytes)
0.7020207253704929

fun1(y) = integrand_B1_new(y,params,ygrid,W_grid_F1_inv, result_init) 

julia> @btime fun1($y)
  876.345 ns (2 allocations: 32 bytes)
0.7020207253704929

Is there a way to write the collapsed function fun1(y) in a way such that it does not allocate? I make calls to fun1(y) in a function like quadgk() to interpolate some intergrals over a grid. For example, I use quadgk(fun1, min, max) and over a few hundred (or thousand) points over the grid, the calls to fun1(y) add up quite a bit.

Conceptually understanding what may be going on would be the most helpful.

You are using global variables inside fun1. You always want to avoid doing that. One way is to use a closure instead.

3 Likes

What you are doing there is using a lot of global variables inside fun1. You are doing something like:

julia> g(x,y) = x + y
g (generic function with 1 method)

julia> function f1(x)
         g(x,y)
       end
f1 (generic function with 1 method)

and y is a global variable inside f1 . That would be ok if y was a const (constant global), but does not seem to be the case. That is very bad for performance because the type of y is not stable.

You can use a closure, for example:

julia> function f2(x, inner_g)
            inner_g(x)
       end
f2 (generic function with 1 method)

julia> f2(1, x -> g(x,2))
3

Or look at this thread: Define data-dependent function, various ways for more alternatives.

1 Like

Thanks @dpsanders! Does it matter that I am using fun1() inside main() and within main() I pass params,ygrid,W_grid_F1_inv, result_init as arguments. Would the globals still be an issue?

The allocation issue persists even when timed inside main().

Thanks @lmiq. Let me try your suggestion and get back to you. This is what I was writing to @dpsanders (to request an example), but you beat me to the punch :slight_smile:

Yes, this is a tricky thing. I guess you mean something like:

julia> function main(x)
         g(x,z) = x + z
         f(x) = g(x,y)
         y = 2
         f(x)
       end
main (generic function with 1 method)

julia> @btime main(1)
  20.894 ns (1 allocation: 16 bytes)
3

This is confusing and there are other discussion whether that should raise an error, a warning, or something else. The problem is that there is an ambiguity on whether y is a constant in the local scope, if it can vary, and that apparently is making the compiler make bad choices (outside my scope of understanding).

The safe alternative is being more explicit on what one wants:

julia> function main(x)
         g(x,y) = x + y
         f(x,inner_g) = inner_g(x)
         y = 2
         f(x, x -> g(x,y))
       end
main (generic function with 1 method)

julia> @btime main(1)
  0.025 ns (0 allocations: 0 bytes)
3

But with a lot of different parameters, I would play even safer and use a struct:

julia> struct Params
         y::Int
       end
       function main(x)
         g(x,y) = x + y 
         f(x,p) = g(x,p.y)
         p = Params(2)
         f(x,p)
       end
main (generic function with 1 method)

julia> @btime main(1)
  0.023 ns (0 allocations: 0 bytes)
3


I just opened another thread with a related issue: Performance closures with changing value of closed over variables

1 Like

Yes, this is exactly my use case. And, the reason I need to do an argument collapse is because I eventually want to use quadgk which needs a one function argument. Thank you again for the amazingly detailed responses. Let me study these, try and I will get back to you here.

Thanks for your suggestions! I made some progress, but not completely I think.

Here is what I was doing previously

fun1(y) = integrand_B1_new(y,params,ygrid,W_grid_F1_inv, result_init) 

Here is what I changed. Does this seem right?

function fun1_new(y::F, params::P,ygrid::Array{F},
                  W_grid_F1_inv::Array{F}, result_init::Array{S}, 
                  integrand_B1_new::Fxn) where {F<:Float64, S<:Int, P<:Param, Fxn}

    g(y, params,ygrid,W_grid_F1_inv, result_init, integrand_B1_new) =
      integrand_B1_new(y)

    g(y, params,ygrid,W_grid_F1_inv, result_init, 
      y -> integrand_B1_new(y,params,ygrid,W_grid_F1_inv,result_init))

end

I ran these two functions inside main(...., params, ygrid, W_grid_F1_inv, result_init, integrand_B1_new, .....) like so


function main(...., params, ygrid, W_grid_F1_inv, result_init, 
              integrand_B1_new, .....)

.
.
.

@timeit to "Quad GK new" quadgk(y -> fun1_new(y, params, ygrid, 
W_grid_F1_inv, result_init, integrand_B1_new),0.0,F2_grid[1])[1]
        
@timeit to "Quad GK old" quadgk(fun1,0.0,F2_grid[1])[1]

.
.
.

end

I got the following result

 Section       ncalls     time   %tot     avg     alloc   %tot      avg
 ──────────────────────────────────────────────────────────────────────
 Quad GK old        1   69.0ΞΌs  71.4%  69.0ΞΌs   1.59KiB  82.3%  1.59KiB
 Quad GK new        1   27.6ΞΌs  28.6%  27.6ΞΌs      352B  17.7%     352B

The allocations reduced quite a bit although not completely gone. As mentioned, integrand_B_new() does not allocate because

@btime integrand_B1_new($y,$params,$ygrid,$W_grid_F1_inv,$result_init)
  808.978 ns (0 allocations: 0 bytes)
0.7020207253704929

Would you have any other suggestions?

Uhm… I think it would be better if you provided a minimal complete running example. I don’t understand exactly what you are doing there now.

1 Like

Might be a bit hard but let me try to wing something. That way you can run on your end as well.

But the general pattern is, for example:

Base.@kwdef struct P
  a::Float64
  b::Float64
  xmin::Float64
  xmax::Float64
  rtol:Float64
end

f(p) = quadgk(x -> exp(-p.a*x^p.b), p.xmin, p.xmax, rtol=p.rtol)

p = P(a=1.0, b=2.0, xmin=0.0, xmax=1.0, rtol=1e-8)

julia> @btime f($p)
  448.594 ns (13 allocations: 288 bytes)
(0.746824132812427, 7.887024366937112e-13)

@leandromartinez98 Since you asked for an MWE, here is the closest example I think. I am wondering if there is a way to get the allocations showing up in fun2() which is used by quadgk() inside main() to 0.


using QuadGK
using TimerOutputs

x = 3.0;
z = 2.0;
const to = TimerOutput();

function integrand_example(x, z) 
     
     x^2 + z  
    
end

fun1(x) = integrand_example(x, z)

function fun2(x, z, integrand_example)
  g(x, z, integrand_example) = integrand_example(x)
  g(x, z, x -> integrand_example(x, z))
end

function main(x, z, integrand_example)

    @timeit to "Quad GK new" quadgk(x -> fun2(x, z, integrand_example),0.0,0.6)[1]
    @timeit to "Quad GK old" quadgk(fun1,0.0,0.6)[1]

end

julia> @btime integrand_example($x, $z)
  0.001 ns (0 allocations: 0 bytes)
11.0

julia> @btime fun2($x, $z, $integrand_example)
  0.001 ns (0 allocations: 0 bytes)
11.0

main(x, z, integrand_example)
show(to, allocations = true, compact = false, sortby = :allocations)

                               Time                   Allocations
                       ──────────────────────   ───────────────────────
   Tot / % measured:        5.38s / 0.00%            825KiB / 0.21%

 Section       ncalls     time   %tot     avg     alloc   %tot      avg
 ──────────────────────────────────────────────────────────────────────
 Quad GK old        1   47.0ΞΌs  82.3%  47.0ΞΌs   1.61KiB  93.6%  1.61KiB
 Quad GK new        1   10.1ΞΌs  17.7%  10.1ΞΌs      112B  6.36%     112B

To be truth, I do not understand what exactly that benchmark is reporting. Does not seem to make sense to me. Also your definitions of g there are redudant. A minimal example of the two approaches would be, I think:

using QuadGK
using TimerOutputs

function integrand_example(x, z)
     x^2 + z
end

#old
fun1(x) = integrand_example(x, z)
main_old() = quadgk(fun1,0.0,0.6)[1]

#new
main_new(x,z) = quadgk(x -> integrand_example(x,z),0.0,0.6)[1]

x = 3.0
z = 2.0

using BenchmarkTools

print("old: "); @btime main_old()

print("new: "); @btime main_new($x,$z)
julia> include("./bash.jl")
old:   1.406 ΞΌs (90 allocations: 1.61 KiB)
new:   256.691 ns (8 allocations: 176 bytes)
1.272

The problem with the β€œold” approach is more explicit if you write:

main_old() = quadgk(x -> fun1(x,z),0.0,0.6)[1]

x is the variable of integration (that is fine). But z is a parameter which is, in this case, a global, non-constant variable (if you declare const z = 2.0 insted, the difference goes away).

1 Like

Ah great, yes now I understand. So, it seems like what we are measuring then as 112B/176B is some overhead for quadgk(). Would that be correct? Admittedly, I’m not the TimerOutputs expert here :slight_smile:

Also, I think @stevengj would know better if quadgk() always allocates. I think it does based on what we just did, but I’m purely speculating.

No, no, actually you are measuring compilation time there, because you are benchmarking the first call…

You should do:

const to = TimerOutput()
function main(x,z)
  x = 3.0
  z = 2.0
  @timeit to "old" main_old()
  @timeit to "new" main_new(x,z)
end
main(x,z)
reset_timer!(to)
main(x,z)
show(to, allocations = true, compact = false, sortby = :allocations)

With which you get:

 ──────────────────────────────────────────────────────────────────
                           Time                   Allocations
                   ──────────────────────   ───────────────────────
 Tot / % measured:      499ms / 0.00%           34.9MiB / 0.00%

 Section   ncalls     time   %tot     avg     alloc   %tot      avg
 ──────────────────────────────────────────────────────────────────
 old            1   20.2ΞΌs  84.4%  20.2ΞΌs   1.61KiB  90.4%  1.61KiB
 new            1   3.73ΞΌs  15.6%  3.73ΞΌs      176B  9.65%     176B
 ──────────────────────────────────────────────────────────────────

Still I do not understand why the times differ so much from the use of @btime, but at least the allocations and trend make sense now.

Actually I had skipped timing the first call in what I posted and only showed you the second run (after doing a reset()).

Yes, I’m also a bit puzzled why the timings are different. Let me also check the docs.

1 Like