A question about redefining a function in Julia

I think “capturing a name within a function” is somewhat a nontrivial thing.

A safer design would be bypassing “capture and then re-bind” completely, e.g.

julia> const ref = Ref{Function}(x -> x+1);

julia> function g(; ref = ref)
           f = ref.x
           f(1)
       end;

julia> g()
2

julia> setfield!(ref, :x, x -> x+2);

julia> g()
3

Edit: the ref = ref in the definition of g is redundant, since the ref is already const, which is fast.

Oh my god. I didn’t think of this. Did you mean that

function f(; x = x/2, y = x)
    @show x y    
end

boils down to

function f()
    x = x/2
    y = x
    @show x y    
end

? If this is the case, I think it’s terrible. Because the first version looks clean and I think it will be clean. The second version apparently violates the avoid untyped global variables performance tip.


I asked AI to give me a reformulation of

function f(w; x = x, y = y)
    return w, x, y
end

, who gives me

function f(w; kwargs...)
    local x = haskey(kwargs, :x) ? kwargs[:x] : Main.x
    local y = haskey(kwargs, :y) ? kwargs[:y] : Main.y
    return w, x, y
end

So I guess that the definition f(w; x = x, y = y) is problematic in terms of performance.

There are many issues in this context…

But whatever, for writing daily code, there is only one thing important for me to care about:
This “const and no arg” style is already fast:

const x = rand(1000)
function loop_over_global()
    s = 0.0
    for i in x
        s += i
    end
    return s
end
loop_over_global()

There is no need to write a standard function like the following

function loop_over_global(x)
    ...
end
loop_over_global(x)

because this style in practical code would become very cumbersome that one can ill afford.

I did a test, which proves that “const” is the fastest, even faster than the standard definition.

test
import Statistics

a = rand(100000);
b = rand(100000);
c = rand(100000);
d = rand(100000);
e = rand(100000);
f = rand(100000);
g = rand(100000);
h = rand(100000);
i = rand(100000);
j = rand(100000);
k = rand(100000);
l = rand(100000);
m = rand(100000);
n = rand(100000);
o = rand(100000);
p = rand(100000);
q = rand(100000);
r = rand(100000);
s = rand(100000);

function loop_over_global(a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s)
    s0 = -1.797693134862315e308
    for vec = (a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s), ind in vec
        s0 += ind
    end
    return s0
end

loop_over_global(a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s)
tvec = [];
for _ = 1:100
    t = time()
    loop_over_global(a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s)
    push!(tvec, time()-t)
end
Statistics.mean(tvec)

It has its uses. Say you create a module with some computation which may take as input a precision, and uses that to compute the number of basis functions used in the computation. However, not being infallible you allow for the caller to specify the number of basis functions. Moreover, this is not typically done, so you let these arguments be keyword arguments. You’d do something like this:

function computation(x, y; precision=√eps(x), Nbasis=nfuncs(precision))
    ...
end

The caller of the function may not be aware of the function nfuncs, it’s an implementation detail. So the caller may happen to define their own nfuncs for some entirely unrelated work. You wouldn’t want that to interfere with the default number of basis functions, the default should use your nfuncs (and and eps).

3 Likes

How would you interpret

function f(; x = x/2, y = x)
    @show x y    
end

other than using some outer variable x for computing x/2?

Potentially because of how the outer x and y are defined (like you mention), but not due to the haskey etc., as you can easily benchmark. The reason is that kwargs will be a (Pairs wrapper around a) NamedTuple for which the haskey and getindex (getfield) can just be compiled away.

Yes, I understand it now.

That grammer is indeed a bit tricky, which can be understood as x = x/2; y = x rather than
x, y = x/2, x.


According to my benchmark (with the instance attached in my #23 post), the performance of
the style “f(; x = x, y = y)” is virtually the same as the standard style (where you explicitly pass all args), their results are around 0.00163 sec on my computer.

Whereas the “const + no arg” style is not only a lot more concise to read but also has shorter time, which is 6.81496e-5 sec on my computer.

Maybe you’re right.

As far as I can tell you only provided code for your loop_over_global(<19 non-const positional arguments>) timing?

For the 0/1 (kw)arg version, i.e. with const x = rand(1000) and

  • function loop_over_global() ... end
  • function loop_over_global(x) ... end
  • function loop_over_global(; x = x) ... end

all of

  • @benchmark loop_over_global()
  • @benchmark loop_over_global(x)
  • @benchmark loop_over_global($x)
  • @benchmark loop_over_global(; x)
  • @benchmark loop_over_global(; x=$x)

(tested at the appropriate time) give me the same result (namely 910 ns mean execution time).

1 Like

Thank you for really testing that. I didn’t use any benchmark packages but just use practical experience: I execute in my shell (zsh in linux).

The three source code are:

kwarg.jl

import Statistics
import Random
Random.seed!(1)

a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s = (
    rand(1000) for _ = 1:19
);

function loop_over_global(; a = a, b = b, c = c, d = d, e = e, f = f, g = g, h = h, i = i, j = j, k = k, l = l, m = m, n = n, o = o, p = p, q = q, r = r, s = s)
    s0 = -1.797693134862315e308
    for vec = (a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s), ind in vec
        s0 += ind
    end
    return s0
end;

loop_over_global()
tvec = [];
for _ = 1:10000
    t = time()
    loop_over_global()
    push!(tvec, time()-t)
end
println("kwarg_in_def> $(sum(tvec))")

arg.jl

import Statistics
import Random
Random.seed!(1)

a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s = (
    rand(1000) for _ = 1:19
);

function loop_over_global(a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s)
    s0 = -1.797693134862315e308
    for vec = (a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s), ind in vec
        s0 += ind
    end
    return s0
end;

loop_over_global(a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s)
tvec = [];
for _ = 1:10000
    t = time()
    loop_over_global(a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s)
    push!(tvec, time()-t)
end
println("standard_arg> $(sum(tvec))")

const.jl

import Statistics
import Random
Random.seed!(1)

const a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s = (
    rand(1000) for _ = 1:19
);

function loop_over_global()
    s0 = -1.797693134862315e308
    for vec = (a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s), ind in vec
        s0 += ind
    end
    return s0
end;

loop_over_global()
tvec = [];
for _ = 1:10000
    t = time()
    loop_over_global()
    push!(tvec, time()-t)
end
println("const_no_arg> $(sum(tvec))")

As you can see, the standard grammar in arg.jl is very cumbersome. The const.jl is both clean and fast.
(One last thing to add, if you drop the const annotation in const.jl, the result is around 18 sec).

Edit: Oh, I see. We can make arg.jl faster by adding const (I did that in constarg.jl) and get


So to sum up: We need to add the const keyword annotation whenever possible for performance. With that, we can write concise and fast no-arg function calls.

I’m sorry the SNR is a bit low. I give a recap here:

When we handle our functions in practice, it would be perfect if we define and call standard functions (at the definition, all names are read from the arg line; at the caller’s place, we specify all arguments). But this style in practice would soon become very cumbersome. Imagine if you have a function with a long arg list, and that function still needs to be wrapped within a “shell” function. So that “shell” function also has to define a long list of args.

Instead of writing

f = x -> x+1
g() = f(1)
f = x -> x+2
g()

as in #1 post (which is a non-standard function since f is not an arg of g), we can write

const ref = Ref{Function}(x -> x+1);
function g()
    f = ref.x
    f(1)
end;
g()
setfield!(ref, :x, x -> x+2);
g()

. Although it is not a standard function, the ref is a const, which retains performance (as suggested in my #28 post).

Yes, this is exactly what I was confused by, as both c and g() are defined by the equal sign, but they behave differently.

Your arg.jl still uses non-const globals, so it’s not surprising that it takes time; there will be dynamic dispatch of the loop_over_global. Moreover, when there are more than a few arguments, a function call does not pass the arguments in registers, so it also takes time.

But, of course, constant globals will be fast.

1 Like

The equal sign is just a syntactical variant of a function definition. I.e.

function g(x)
   f(x)
end

means exactly the same as

g(x) = f(x)
1 Like

My confusion actually comes from my experience with Mathematica which I used more before. In Mathematica, one can explicitly specify whether expressions determined by other functions are evaluated immediately or not at the time of defining the function.
For example:

f[x_]:=x+1
g[x_]=f[2] (*define g[x] with f[2] immediately evaluated*)
h[x_]:=f[2] (*f[2] is not immediately evaluated when h[x] is defined until h[x] is called*)
g[1] (*g[1]=3*)
h[1] (*h[1]=3*)
f[x_]:=x+2
g[1] (*still g[1]=3*)
h[1] (*now h[1]=4*)

It seems that in Julia a function is defined without any expression evaluated, just like the way h was defined. That puzzled me because the equal sign = can be used to define both a variable and a function which behave nevertheless differently.
In addition, I was wondering if there is a similar way in Julia to specify whether expressions are immediately evaluated or not in defining functions.

I understand now. I think the same symbol (the equal sign) in both definitions of a variable and a function puzzled me.

1 Like

There is no such direct way in julia. It is possible, indirectly with @eval and “$-interpolation”:

julia> f(x) = x+1

julia> @eval g(x) = $(f(1)) + x;

julia> g(0)
2

julia> f(x) = x+2;

julia> g(0)
2

There are also other ways to incorporate some precomputed constant via a “closure”, i.e. a function with “hidden” parameters. Below, add is a function which returns a function which adds a:

julia> function add(a)
           return (x -> x + a)
       end
add (generic function with 1 method)

julia> f = add(1)
#add##0 (generic function with 1 method)

julia> g = add(2)
#add##0 (generic function with 1 method)

julia> f(0)
1

julia> g(0)
2

Please don’t use @eval for this. A better approach — especially for beginners — is to use a temporary let name.

julia> x = 0
0

julia> f() = x+1
f (generic function with 1 method)

julia> f()
1

julia> x = 10
10

julia> f()
11

julia> let x=x
           global g
           g() = x + 1
       end
g (generic function with 1 method)

julia> g()
11

julia> x = 100
100

julia> g()
11

julia> f()
101

The same approach now works if you want to “freeze” the evaluation of a function too. You just evaluate it and reference that result.

julia> let result=f()
           global h
           h() = result+1
       end
h (generic function with 1 method)

julia> h()
102

julia> f() = 42
f (generic function with 1 method)

julia> h()
102
5 Likes

Came to think of it, it would be handy with $-interpolation in function definitions, though I think there might be some gotchas with it.

There already have been a number of topics on how to organise many method arguments, such as

and we are starting to get off-topic here, but in general you would just group your arguments into (one or multiple) structs, (named) tuples, …

This example is also just very artificial. You destructure your generator to 19 variables, which you then later combine again. Instead you could have just defined (e.g.) vec = Tuple(rand(1000) for _ = 1:19) and function loop_over_global(vec) ... end.

2 Likes