More effective function parameter?

I have a function that I frequently use

function expectation(f::Function; μ=0.0, σ=1.0) 

and it works great for

expectation(x -> (x-1)^2)

as well as a many other examples. However, when I have a variable, e.g.,

avg=0;
expectation(x -> (x - avg)^2);

Using @time, I see that I go from 384k allocations to 11M allocations as avg is global variable. However, if I do

function test_me(avg)
    return expectation(x -> (x-avg+1)^2)
end
@time test_me(1)

which is realiably at just 27k allocations. Why is that even more effective than both earlier cases? Is there a less clumsy way to do this instead of wrapping a function around every call of expectation in my Jupyter notebooks?

Declaring

const avg=0; #  or const avg = 0. If you rather want a Float64
expectation(x -> (x - avg)^2);

instead should make the second example reasonably efficient, although you wont be able to change the type of avg afterwards.
I cannot say for sure about why the last example is more performant than the first, but my guess is that the anonymous function

x -> (x -avg)^2

might get compiled new for each call, so maybe that is the reason?
You could check if this is correct by defining the function explicitly:

const avg = 0.
func(x) = (x-avg)^2
expectation(func)

On a final note, I believe functions with keyword arguments are less effective than functions arguments without keywords, although this will probably not change much in your case.
Hope that helps

1 Like

You certainly should not use non-constant global variables.

And, effectively, the closure, outside a function, will be compiled at every call. But note that probably that is only a significant part of the time if the function is very fast, in which case for an interactive use is probably irrelevant. If you actually will call this expectation function many times in a loop, better enclose that in a function.

1 Like

I don’t have an answer to that question, just a few observations in line with the foregoing discussion.

If the same definition for an anonymous function appears at different points in the code, they are not the same function:

julia> typeof(x -> 2x)
var"#33#34"

julia> typeof(x -> 2x)
var"#35#36"

julia> typeof(x -> 2x)
var"#37#38"

Let’s look at what happens when we use map. Every time map is called with different argument types, a new version of map will be compiled. The tricky part here is that every function has it’s own type, and as you can see above, each anonymous function has its own type. So, if you call map(x -> 2x, 1:1_000_000) multiple times, it recompiles map every time (it’s not just the anonymous function that gets compiled). We can see this compilation going on by using @time (on Julia 1.6+):

julia> @time map(x -> 2x, 1:1_000_000);
  0.031083 seconds (48.47 k allocations: 10.478 MiB, 88.39% compilation time)

julia> @time map(x -> 2x, 1:1_000_000);
  0.030761 seconds (48.47 k allocations: 10.478 MiB, 88.54% compilation time)

julia> @time map(x -> 2x, 1:1_000_000);
  0.033818 seconds (48.47 k allocations: 10.478 MiB, 89.68% compilation time)

If you save the first anonymous function and reuse it, then you won’t incur the compilation cost in subsequent calls:

julia> f = x -> 2x
#9 (generic function with 1 method)

julia> @time map(f, 1:3);
  0.026009 seconds (48.45 k allocations: 2.849 MiB, 99.42% compilation time)

julia> @time map(f, 1:3);
  0.000011 seconds (4 allocations: 208 bytes)

julia> @time map(f, 1:3);
  0.000012 seconds (4 allocations: 208 bytes)

However, you shouldn’t normally need to worry about this too much, because if you’re planning to reuse an anonymous function, it’s probably because it’s part of a larger function, in which case the larger function (including the internal map call) will just get compiled once. I was trying to come up with a good example of that, but oddly enough I’m just getting examples where the compilation time is zero. (My guess is that it’s not zero, just very close to zero, so it doesn’t get printed.)

julia> function foo(a)
           b = 2a
           map(x -> 2x + b, 1:4)
       end
foo (generic function with 1 method)

julia> @time foo(1);
  0.000002 seconds (1 allocation: 112 bytes)

julia> @time foo(1);
  0.000002 seconds (1 allocation: 112 bytes)
1 Like

let block introduces new scope, and should give performance equal to function declaration.

let avg=0
    expectation(x -> (x - avg)^2)
end
2 Likes

Thanks for all the suggestions. For

@time expectation(x -> (x-1)^2)

avg = 1
@time expectation(x -> (x-avg)^2)

function test_me(avg)
    return expectation(x -> (x-avg+1)^2)
end
@time test_me(1)

const const_avg = 1
@time expectation(x -> (x-const_avg)^2)

f = x -> (x-avg)^2
@time expectation(f)
@time expectation(f)

@time let avg = 1
    expectation(x -> (x-avg+1)^2)
end

I get

  0.732484 seconds (891.41 k allocations: 72.566 MiB, 85.95% compilation time)
  1.348060 seconds (27.82 M allocations: 486.733 MiB, 12.28% gc time, 51.40% compilation time)
  0.363870 seconds (82.05 k allocations: 29.578 MiB, 14.95% gc time, 65.58% compilation time)
  0.626622 seconds (891.80 k allocations: 72.660 MiB, 83.85% compilation time)
  1.394146 seconds (27.82 M allocations: 486.883 MiB, 11.30% gc time, 45.93% compilation time)
  0.611226 seconds (26.89 M allocations: 436.018 MiB, 11.72% gc time)
  0.714353 seconds (902.22 k allocations: 73.082 MiB, 8.31% gc time, 87.21% compilation time)

Leads me to the conclusions that

  1. Let is as good as the first case but worse than the third…
  2. Save the first anonymous function and calling it twice is as good as let in the second run and as bad as before in the first run.
  3. Wrapping a function around it gives a factor 2 in speed.

I still don’t get (3).

If you did it exactly as shown (i.e. time of the first run), then you are measuring compile time, which is probably not what you need (if you are not trying to improve compiler of course) .

5 Likes

Doing this in the global scope will make f type unstable. Probably this notation should always be avoided. Use

f(x) = (x-avg)^2

which is the same, but constant.

1 Like

Thanks for the great exercise! After changing to

@time expectation(x -> (x-1)^2)

avg = 1
@time expectation(x -> (x-avg)^2)

@time begin
    function test_me(avg)
        return expectation(x -> (x-avg+1)^2)
    end
    test_me(1)
end

@time begin
    const const_avg = 1
    expectation(x -> (x-const_avg)^2)
end

@time begin
    f_1 = x -> (x-avg)^2
    expectation(f_1)
end

@time begin
    f_2(x) = (x-avg)^2
    expectation(f_2)
end
@time expectation(f_2)

@time let avg = 1
    expectation(x -> (x-avg+1)^2)
end

@time let cavg = avg
    f_3(x) = (x-cavg)^2
    expectation(f_3)
end

@time begin
    const cavg = 1
    f_4(x) = (x-cavg)^2
    expectation(f_4)    
end

@time begin # Second run
    expectation(f_4)    
end

things make sense:

  0.567262 seconds (1.84 M allocations: 132.267 MiB, 7.34% gc time, 85.63% compilation time)
  0.951705 seconds (28.14 M allocations: 507.379 MiB, 6.45% gc time, 39.72% compilation time)
  0.320763 seconds (1.18 M allocations: 90.105 MiB, 77.55% compilation time)
  0.304999 seconds (891.75 k allocations: 72.613 MiB, 5.10% gc time, 76.83% compilation time)
  0.837391 seconds (27.82 M allocations: 486.748 MiB, 7.61% gc time, 35.02% compilation time)
  0.825175 seconds (27.82 M allocations: 486.741 MiB, 6.84% gc time, 34.12% compilation time)
  0.520320 seconds (26.89 M allocations: 436.018 MiB, 7.35% gc time)
  0.302803 seconds (902.16 k allocations: 73.113 MiB, 5.59% gc time, 74.56% compilation time)
  0.288606 seconds (897.09 k allocations: 72.957 MiB, 3.60% gc time, 76.29% compilation time)
  0.312757 seconds (891.75 k allocations: 72.641 MiB, 7.05% gc time, 74.87% compilation time)
  0.076398 seconds (15.64 k allocations: 25.528 MiB, 11.04% gc time)

That was fun!

I’m somewhat lost here, but you are measuring compilation time in most cases. I suggest using BenchmarkTools. For a quick overview on how to use it, see: Benchmark · JuliaNotes.jl

4 Likes