Define data-dependent function, various ways

This is a topic that is frequently raised here (I have asked this before). The problem is that of defining a function that will be input as a parameter for a solver, but the function depends on data while the function call inside the solver does not have a field to accept the data. Usually, the first try of a new programmer is to set the data on global variables, which is very detrimental to performance. Soon one finds out that if one declares the data as a constant global the performance is good, but that is not very nice because one still depends on global variable names. The next strategy is the use of closures, which is probably the most elegant syntax. Today I found out that we can also use let blocks for that, which, while not as elegant as the closures (in my opinion), allow a very self-contained definition of the problem and its parameters.

I do not have any specific question, except asking if there is any further advice, or idea, that could improve the discussion. I have written a small code with the four alternatives:

using BenchmarkTools
using Test

# The "solver"

function solver(f,x0)
  f(x0)
end 

# The "data", declared or not as constant

data = collect(0:100)
const data_const = collect(0:100)

# The "initial point"

x0 = ones(length(data))

#
# using global, non-constant, data (wrong way)
#

function f_global_non_const_data(x)
  s = 0.
  for i in 1:length(x)
    s += (x[i]-data[i])^2
  end
  s
end
println(" Global, non-constant data: ")
f1 = @btime solver($f_global_non_const_data,$x0)

#
# Using constant global data 
#
function f_global_const_data(x)
  s = 0.
  for i in 1:length(x)
    s += (x[i]-data_const[i])^2
  end
  s
end
println(" Global, constant data: ")
f2 = @btime solver($f_global_const_data,$x0)

#
# Using a closure (pass non-const data)
#
function f_closure(x,data)
  s = 0.
  for i in 1:length(x)
    s += (x[i]-data[i])^2
  end
  s
end
println(" Closure: ")
f3 = @btime solver(x -> f_closure(x,$data),$x0)

#
# Using a let block
#
let 
  let_data = collect(0:100)
  function f_let(x,let_data)
    s = 0.
    for i in 1:length(x)
      s += (x[i]-let_data[i])^2
    end
    s
  end
  global f_let(x) = f_let(x,let_data)
end
println(" Let block: ")
f4 = @btime solver($f_let,$x0)

@test f1 ≈ f2 ≈ f3 ≈ f4     

All the alternatives, except the one using the non-const global, are fine (differences in
those benchmarks are random):

 Global, non-constant data: 
  5.259 μs (404 allocations: 6.31 KiB)
 Global, constant data: 
  88.059 ns (0 allocations: 0 bytes)
 Closure: 
  86.441 ns (0 allocations: 0 bytes)
 Let block: 
  88.073 ns (0 allocations: 0 bytes)

1 Like

I believe the suggested way is to use the closure. It is just as performant as anything else, but also more flexible and less code.

5 Likes

Indeed, but I think the closure has one drawback, which is that the function is defined on the call to the solver. If the solver is called multiple times, that has also its allocation and performance penalty. Correct me if I’m wrong.

Edit:

I have added this test:

# Multiple calls:

function call_solver(f,x0)
  s = 0.
  for i in 1:100
    s += solver(f,x0)
  end
  s
end

println("Multiple calls:")

println(" Global const data:")
@btime call_solver($f_global_const_data,$x0)
println(" Closure:")
@btime call_solver(x -> f_closure(x,$data),$x0)
println(" Let block:")
@btime call_solver($f_let,$x0)       

I am a bit surprised that the first option is quite slower. Any idea why?

Multiple calls:
 Global const data:
  27.744 μs (200 allocations: 3.13 KiB)
 Closure:
  11.315 μs (201 allocations: 3.14 KiB)
 Let block:
  11.267 μs (200 allocations: 3.13 KiB)
3.28351e7

The extra allocation of the closure is irrelevant here.

I think function-like objects (sometimes called “functors”) would be yet another option to achieve the same kind of things.

(AFAIU, this is what’s internally used to implement closures, and I think it requires a bit less work for the compiler to optimize. Maybe someone more knowledgeable will chime in and confirm or correct this)

struct Fun
    data :: Vector{Int}
end

function (f::Fun)(x)
  s = 0.
  for i in 1:length(x)
    s += (x[i]-f.data[i])^2
  end
  s
end

println(" Functor: ")
functor = Fun(0:100)
f5 = @btime solver($functor, $x0)

On my system, the single call benchmark yields:

 Global, non-constant data: 
  8.206 μs (404 allocations: 6.31 KiB)
 Global, constant data: 
  155.388 ns (0 allocations: 0 bytes)
 Closure: 
  138.079 ns (0 allocations: 0 bytes)
 Let block: 
  188.767 ns (0 allocations: 0 bytes)
 Functor: 
  155.384 ns (0 allocations: 0 bytes)

and the multiple-call benchmark:

 Global const data:
  18.120 μs (200 allocations: 3.13 KiB)
 Closure:
  18.159 μs (201 allocations: 3.14 KiB)
 Let block:
  19.642 μs (200 allocations: 3.13 KiB)
 Functor:
  13.994 μs (0 allocations: 0 bytes)


Complete code, benchmarking all variants so far
using BenchmarkTools
using Test

# The "solver"

function solver(f,x0)
  f(x0)
end

# The "data", declared or not as constant

data = collect(0:100)
const data_const = collect(0:100)

# The "initial point"

x0 = ones(length(data))

#
# using global, non-constant, data (wrong way)
#

function f_global_non_const_data(x)
  s = 0.
  for i in 1:length(x)
    s += (x[i]-data[i])^2
  end
  s
end
println(" Global, non-constant data: ")
f1 = @btime solver($f_global_non_const_data,$x0)

#
# Using constant global data
#
function f_global_const_data(x)
  s = 0.
  for i in 1:length(x)
    s += (x[i]-data_const[i])^2
  end
  s
end
println(" Global, constant data: ")
f2 = @btime solver($f_global_const_data,$x0)

#
# Using a closure (pass non-const data)
#
function f_closure(x,data)
  s = 0.
  for i in 1:length(x)
    s += (x[i]-data[i])^2
  end
  s
end
println(" Closure: ")
f3 = @btime solver(x -> f_closure(x,$data),$x0)

#
# Using a let block
#
let
  let_data = collect(0:100)
  function f_let(x,let_data)
    s = 0.
    for i in 1:length(x)
      s += (x[i]-let_data[i])^2
    end
    s
  end
  global f_let(x) = f_let(x,let_data)
end
println(" Let block: ")
f4 = @btime solver($f_let,$x0)


struct Fun
    data :: Vector{Int}
end

function (f::Fun)(x)
  s = 0.
  for i in 1:length(x)
    s += (x[i]-f.data[i])^2
  end
  s
end

println(" Functor: ")
functor = Fun(0:100)
f5 = @btime solver($functor, $x0)

@test f1 ≈ f2 ≈ f3 ≈ f4 ≈ f5


# Multiple calls:

function call_solver(f,x0)
  s = 0.
  for i in 1:100
    s += solver(f,x0)
  end
  s
end

println("Multiple calls:")

println(" Global const data:")
@btime call_solver($f_global_const_data,$x0)
println(" Closure:")
@btime call_solver(x -> f_closure(x,$data),$x0)
println(" Let block:")
@btime call_solver($f_let,$x0)
println(" Functor:")
@btime call_solver($functor,$x0)
5 Likes

No, my understanding is that it only gets compiled once—only the data in the closure changes (if the types are the same). Closures are fast and are used extensively in the Julia standard library.

However, you may need to define the closure inside a function when benchmarking.

1 Like

I am a little bit intrigued on what are the 200 allocations that we see in all options except the Functor one in the above multiple-call of solver examples. Any hint?

This is actually quite tricky, and I think that extra care might have to be taken in the use of closures in these cases. Compilation occurs only once, but if the closures are defined as the argument functions in the global scope, which is the most common user-case scenario, they allocate more than other alternatives.

julia> s_outer(f,x0) = f(x0)
s_outer (generic function with 1 method)

julia> @time s_outer(x -> f_closure(x,data),x0) # gets compiled
  0.019374 seconds (16.87 k allocations: 951.150 KiB)
328351.0

julia> @time s_outer(x -> f_closure(x,data),x0) # still allocates
  0.005339 seconds (1.00 k allocations: 66.479 KiB)
328351.0

julia> @time s_outer(f_global_const_data,x0) # gets compiled
  0.009657 seconds (2.60 k allocations: 126.927 KiB)
328351.0

julia> @time s_outer(f_global_const_data,x0) # does not allocate
  0.000005 seconds (1 allocation: 16 bytes)
328351.0


Edit: I understand that you pointed to this:

julia> function sbench(data,x0)
          s_outer(x->f_closure(x,data),x0)
       end
sbench (generic function with 1 method)

julia> @time sbench(data,x0)
  0.009449 seconds (8.29 k allocations: 405.983 KiB)
328351.0

julia> @time sbench(data,x0)
  0.000003 seconds (1 allocation: 16 bytes)
328351.0

But while this is how things will work in any package (and that is great), it is not the way, I think, most users will first pass the closures to the solvers.

One problem with the global constant data is that frequently this data (aka these other inputs) is/are not static. I suppose if you really only had one set of it then it wouldn’t matter. But if it changes, that obviously rules out that approach, whereas closures don’t care if it changes.

1 Like

Completely agreed. I do not like nor recommend the global constant approach, because of that.

I would second @ffevotte’s suggestion for a callable type. It’s like a closure, essentially, but easier to work with, document, etc, and it’s not much extra effort.

2 Likes

No, it’s not tricky at all. The basic rule is simple: as it says in the manual, Any code that is performance critical or being benchmarked should be inside a function. This is the most important rule to follow when using Julia. If you follow this rule, closures do not allocate for type-stable code.

If you have a performance-critical operation (i.e. one called over and over, so that compilation overhead matters), then put it in a function. In your case:

julia> foo(x0, data) = s_outer(x -> f_closure(x,data),x0) 

julia> @btime foo($x0, $data);
  108.264 ns (0 allocations: 0 bytes)

Note the lack of allocations. You can pass new x0 and data vectors as often as you like, and it will continue not to allocate.

The @btime macro from BenchmarkTools.jl is the best way to do these kind of measurements. If you do

julia> @time foo(x0, data)
  0.008451 seconds (159 allocations: 9.859 KiB)

then you will see an allocation, but that is because it is dispatching dynamically on the global variable x0 and data, not because it is re-compiling anything. These allocations are not a problem, since the 0.01s overhead occurs only at the top-level, in interactive code, where it is practically irrelevant. The same code would not allocate if it were called in another function (e.g. in a tight loop).

If you are working interactively and don’t want to define a function, then it should also be the case that top-level overheads are irrelevant. (Note that compilation time does not scale with the size of the problem, only with the size of the code, so interactive compilation times do not get worse as you go to more expensive problems, e.g. operating on lots of data.)

13 Likes

Please do not take me wrong, I do not disagree at all with you.

Yet, for a novice in Julia, the fact that there are functions themselves which have to be implemented inside other functions for the code to be performant might be tricky.

Also I think I am probably getting confused by some other threads I have followed in which people got slow code by using closures, but now I think that that was mostly because there was a type instability coming from the fact that the function was defined outside the calling function, and with global assignments. Things like this:

julia> f = x -> x^2
#1 (generic function with 1 method)

julia> solver(f,x) = f(x)

       function g(x)
         s = 0.
         for i in 1:1000
           s = s + solver(x -> f(x),x)
         end
         s
       end
g (generic function with 1 method)

julia> g(1)
1000.0

julia> @code_warntype g(1)
Variables
  #self#::Core.Compiler.Const(g, false)
  x::Int64
  s::Any
  @_4::Union{Nothing, Tuple{Int64,Int64}}
  i::Int64
  #3::var"#3#4"

Body::Any
1 ─       (s = 0.0)
│   %2  = (1:1000)::Core.Compiler.Const(1:1000, false)
│         (@_4 = Base.iterate(%2))
│   %4  = (@_4::Core.Compiler.Const((1, 1), false) === nothing)::Core.Compiler.Const(false, false)
│   %5  = Base.not_int(%4)::Core.Compiler.Const(true, false)
└──       goto #4 if not %5
2 ┄ %7  = @_4::Tuple{Int64,Int64}::Tuple{Int64,Int64}
│         (i = Core.getfield(%7, 1))
│   %9  = Core.getfield(%7, 2)::Int64
│   %10 = s::Any
│         (#3 = %new(Main.:(var"#3#4")))
│   %12 = #3::Core.Compiler.Const(var"#3#4"(), false)
│   %13 = Main.solver(%12, x)::Any
│         (s = %10 + %13)
│         (@_4 = Base.iterate(%2, %9))
│   %16 = (@_4 === nothing)::Bool
│   %17 = Base.not_int(%16)::Bool
└──       goto #4 if not %17
3 ─       goto #2
4 ┄       return s

julia> 

There are a couple of issues with that example:

f = x -> x^2 is a wacky way to write f(x) = x^2

s is improperly initialized which leads to type instability (this has nothing to do with the discussion at hand)

solver(x->f(x),x) could just be solver(f,x) (after fixing item 1) which might also help with that type instability

2 Likes

I think the other issue is that f is a non-constant global variable within the definition of g.

2 Likes

Only your inner loop code needs to be inside a function. If you are creating an anonymous function once at the top level interactive code then compilation overhead is irrelevant.

2 Likes

I am not sure what you are trying to say here. This is something very fundamental to how Julia works, so users should pick it up, “tricky” or not. IMO most people do, once they get around to benchmarking their code; it is documented in detail.

Nothing really. I was confused by what I had seen in other threads, where the issues were type instabilities generated and not the closures.

I was confused, specifically, with this possibility:

julia> f = x -> x^2
#1 (generic function with 1 method)

julia> solver(f,x) = f(x)
solver (generic function with 1 method)

julia> function g(x)
         s = 0.
         for i in 1:1000
           s = s + solver(f,x)
         end
         s
       end

Where f is passed inside g to the solver, and syntactically f appears to be a closure. This has been raised before in some threads here, generating a type instability that leads to bad performance. But the problems is ill definition of x and that f is global.

Can I suggest that the best takeaway from this thread would be the value of having a page with guides on the best way to write Julia code for this use case? In many ways, it’s the Julia cookbook that’s most badly missing for new users. Cookbooks seldom involve comparisons between options: they usually just say, “to make an X, you should do Y1, then Y2, etc.”.

3 Likes

It is a closure. (That’s like saying y = 3 “syntactically appears to be an integer.”) The value of f is fine, it’s just that it’s referred to by a global variable, so referring to that global variable in your inner-loop function g will be slow.

If you passed f as a parameter to g it would be perfectly fine:

julia> function g(x,func)
           s = 0.
           for i in 1:1000
               s = s + solver(func,x)
           end
           s
       end

julia> @btime g(3,$f);
  1.065 μs (0 allocations: 0 bytes)

Avoiding non-const global variables in inner loops is the one and only rule here.

4 Likes

Thanks. Maybe then a clarification on the difference between these would help me:

f = x -> x^2

and

f(x) = x^2

In the second case f is a constant global, while in the first it is not a constant? Or is there any other fundamental difference?