Why the free variables in function body don't get bound at function definition time?

Hi,
What is the advantage to having such a difference between how variables use another variables vs how functions use them?

# Assigning to variable
julia> a=1;

julia> b=a
1

julia> a="asdf";

julia> b
1
# OK, as expected
# 
# Assigning to function
julia> a=1;

julia> fun()=a
fun (generic function with 1 method)

julia> fun()
1

julia> a="asdf";

julia> fun()
"asdf"

It doesn’t seem good that calling the function again returns a different value and that you cannot predict what the function will return until you run the code, unlike in case of assigning to a variable.

EDIT: after reading the first replies: above I wrote “Assigning to a function”, which is misleading ; I meant defining the function to use another variable.

In julia, a name is a “label”, it can stick to concrete objects.
a = 1 \iff you associate the “label” a with the concrete object 1.
About fun() = a: you shouldn’t write a function like this.
A function take some input, and generate the output accordingly.
Or take no input, and generate something. such as time().

You are missing the point. It’s not about a function having or not parameters, but about its return value (or effect, for that matter) not being fixed/determined by the state of the program at the time of defining the function, nor by what arguments will be passed (if any), resulting in harder to predict behavior. Unlike the value of a variable.

julia> a=1;

julia> fun(x)=x+a;

julia> fun(0)
1

julia> a=2; fun(0)
2

This follows directly from Julia’s use of lexical scope.: the a in the functions you defined refers to the global variable a, not the value of the variable at definition.

I think it can help to realize that

fun() = a

isn’t assigning the value of a to a function, but rather is syntactic sugar for

function fun()
   return a
end

If you really want the behavior you described, then you need to close over / capture the value at definition time, i.e. you need a closure. You can do this with a let block:

julia> a = 1
1

julia> fun = let a2 = a
       function()
       return a2
       end
       end
#3 (generic function with 1 method)

julia> fun()
1

julia> a = 2
2

julia> fun()
1

julia> a2
ERROR: UndefVarError: `a2` not defined in `Main`
Suggestion: check for spelling errors or missing imports.
2 Likes

I would never write code like this. (And I don’t suggest anyone do).
I either would do

a = 1 # then I'll never change its value
function f(x)
    return x + a
end

or I would do

function f(x, a)
    return x + a
end

The latter case is more advisable.

Another example that might be instructive would be

julia> a = [1, 2]
2-element Vector{Int64}:
 1
 2

julia> v = [a for _ in 1:3] # construct an object using a name
3-element Vector{Vector{Int64}}:
 [1, 2]
 [1, 2]
 [1, 2]

julia> a[1] = 7;

julia> v
3-element Vector{Vector{Int64}}:
 [7, 2]
 [7, 2]
 [7, 2]

julia> v_stable = [[1, 2] for _ in 1:3] # construct an object using concrete objects
3-element Vector{Vector{Int64}}:
 [1, 2]
 [1, 2]
 [1, 2]

julia> v_stable[1][1] = 7;

julia> v_stable
3-element Vector{Vector{Int64}}:
 [7, 2]
 [1, 2]
 [1, 2]

Yes so simply put, the function fun looked for the value that variable a is holding from the environment where the function was defined (the global of your REPL), since you didn’t assign anything to the variable within the function. This is a useful thing to allow for many reasons. If you must use them at all, global variables should be made constants with const a = 1 to prevent the error.

just wanted to add that const is not bulletproof, what I mean is that the symbol itself will not be allowed to change reference, but if the type is mutable, the overall value could change:

julia> const a = [1, 2,3]
3-element Vector{Int64}:
 1
 2
 3

julia> a[1] = 2
2

2 Likes

Forgot to mention

julia> a = [1, 2];

julia> v = [a for _ in 1:3]
3-element Vector{Vector{Int64}}:
 [1, 2]
 [1, 2]
 [1, 2]

julia> v[1][1] = 6; # I only modified first(v)

julia> v # but v[:] all changes
3-element Vector{Vector{Int64}}:
 [6, 2]
 [6, 2]
 [6, 2]

@palday
The initial title was misleading, hopefully clearer now.
Thanks for explaining what’s a “closure”, I didn’t know.

So, my question then is why are functions not /don’t behave like/ closures by default?
If they would capture the values of outer/ “free” variables automatically at definition time, then that would force any dependency that the function must have to be explicit, through the parameters/arguments.

It’s a question of language design, I suppose.
I did notice that Julia is like Python and C in this respect.
@eteppo , can you please give some examples of reasons why the current function behavior is useful? By the way, I showed at REPL, but essentially the same is inside local scope:

julia> function bar()
           a=0
           fun() = a
           println(fun())
           a=1
           println(fun())
       end
bar (generic function with 1 method)

julia> bar()
0
1

You’re right, it’s exactly why it’s often said that “globals are evil.” Thing is, there are uses for global state. There are many recommendations for how functions should behave, and none of them are hard rules. Global variables feature in most programming languages, use them wisely.

They do, the previous comment is misleading. A closure may perform like it captures values if the variables could not be reassigned, but on the language level, closures always capture variables. You get the same performance benefits in globally scoped methods if you use const global variables. Closures and global methods always reference external variables in their bodies.

3 Likes

If you are only interested to the value of a variable in a specific moment, you can always pass that value as an argument to a function. The point is that sometimes function do want to access (and maybe modify) a global state, think for example to global_logger, Random.seed! or Plots.plot!, those would not be able to execute as they intended.

1 Like

You could define a semantic that works like that, but no language I’m aware of actually does that. Let me try to add some background:

Purely functional programming languages basically only work with values. While you can give names to values, these can always be replaced by the corresponding value, e.g., a = 2 and b = 3 + a means that b = 3 + 2 in every context. In particular, in such a language – e.g. Haskell – constants and functions with no arguments are just the same. Yet, since names can always be replaced by their definition no notion of mutation can be defined, i.e., what should the value of x = x + 1 be if you must be able to simply replace x by its definition?

Most languages, allow for mutation, i.e., changing the value assigned to some name. In this case, we need to distinguish between bindings and values. Think of bindings as mappings between names and values valid in a certain context – called environment, i.e., a = 2 establishes a binding (if a was not defined already) between the name a and the value 2. If a binding already exists in some environment, a = 3 would change the value assigned to the same name in this environment.
Now, what does b = a do? In Julia this binds the value of a (in the current environment) to the name b. Thus, the semantics is that reading a name looks up its (current) value. The same is true in a function, but the lookup is delayed and executed when the function is called (not when its defined). What makes lexical closures so useful is that they close over the binding and not the value. This allows for multiple functions to share a binding in a single environment. A quick and dirty object system could be build on top of that:

obj = (let a = 1 # New binding in fresh environment
    inc() = a += 1
    dec() = a -= 1
    get() = a
    (; inc, dec, get)
end)

obj.get() # 1
obj.inc(); obj.inc(); obj.dec(); obj.inc();
obj.get() # 3

In this respect, the lhs and rhs of a assignment behave differently in that the lhs writes to a binding while the rhs looks up its value.

Finally, some languages – such as Rust – allow to distinguish explicitly between the binding and its values via reference semantics (also search for lvalue/rvalue):

fn main() {
    let mut x = 10;
    println!("One: {}", x);  // prints 10

    let mut z = x;  // Another name for value of x
    z += 2;
    println!("Two: {}", x);  // prints 10 again
    
    let y = &mut x; // Another name for x itself
    *y += 1;
    
    println!("Three: {}", x);  // prints 11
}
4 Likes

Full disclosure: I was curious whether when Julia was designed, the question I posed was posed at all, but it looks like it’s rather a traditional, historical way that functions are designed. The post may belong to “Offtopic” category.

Interesting; maybe there should be one?

Let’s say, functions in the hypothetical language were such that to always capture the values of the free variables (regardless of global or not) at definition time.
If a function’s job was to increment global a by a given increment, then you might have to define it and call it like this:

function update(anew, increment) 
   global a = anew+increment
end
#...
# call:
update(a, 17)   # or maybe `update(global a, 17)` 

(EDIT: originally instead of parameter anew I wrote just a but now fixed as Benny pointed out that in most languages that would conflict with next global a )

And so, you would get total transparency as to what the effect of the function depends on.
Any major disadvantages?

My curiousity is not only from this being more transparent/predictable semantics for functions, but also I was reading the other day on this discourse on scoping rules discussions and one of the reasons that was brought up for the current rules is that it was to facilitate “closures” when creating them say in for loops.
Maybe the above could be a solution, though I guess in a rather hypothetical future language, not Julia.

Note that the argument is also valid for any other definition, as that of any function, like rand. Should every function capture the current definition of every function it calls, like if it inlined everything? Does not seem reasonable. Special casing other values is probably creating more confusion than otherwise.

(post deleted by author)

Could you explain a bit more, what’s with rand or what can go wrong?

Your update example can already be done in languages with pass-by-reference, e.g. Rust, no need to change the handling of free variables as the reference is passed as an explicit argument, i.e., is not free anyways.

Transparency is good and some languages such as Haskell are explicitly tracking side-effects already. For the current case, i.e., global variables I believe the gains would be marginal, but you loose a lot of power – OOP is basically build around mutation to shared bindings and lexical closures are equally powerful (see above).
If you want to retain the value at definition an easy fix is to simply rebind the variable and capture that:

a = 10

let a = a  # rebind
    global tada() = a
end

tada() # 10
a = 12
tada() # still 10

Formally still mutable, but no other binding is shared with the new environment.

I mean this:

julia> g() = 1
g (generic function with 1 method)

julia> f() = g()
f (generic function with 1 method)

julia> f()
1

julia> g() = 2
g (generic function with 1 method)

julia> f()
2
1 Like

That wouldn’t fly in any language, you have 2 different variables, one local and one global, trying to use the same name in one scope. Assuming you named the argument b instead, that syntax is already necessary for reassigning global variables from a method. Definitions capturing values wouldn’t change anything.

You actually get less transparency. Right now, definitions take definition-time values instead of referencing variables in a couple ways:

  1. Not everything in the method definition is assigned and executed during the call. The type annotations run during the definition, so it just takes the values then and totally forgets about the variables after.
  2. With metaprogramming, you can interpolate values right into expressions. When evaluated, the method body just stores or references those values directly, not the variables you used to make the expression.

Problem is, how do you know those definition-time values after the global variable is reassigned independently? For (1), you need to use a reflection function methods and spot the right method signature. For (2), you need to use the reflection macros starting with @code_. Either way, you just get printouts, not a value or a reference you can interact with. The only way for you to access that method-internal value is for the method to return it to you, but 1) that’s not always what you want the method to do, and 2) running the method can do many other things that you don’t need at the moment, including changing that internal value. Checking a global variable for its current state would’ve been much easier.

And again, people don’t actually care about whether method bodies copy values from global variables at definition time, that’s just language semantics. All the practical things can be pulled off with input values. If you can’t do that through arguments, you can instantiate functions (or other callables) that contain values, long after their methods were defined. Global variables are just easier when you really need access to be global, and definition-time copied values are the opposite of that.

2 Likes

Yes, the disadvantage is losing flexibility. Haskell is a great example for this. If you have a solid design which does not change, you get a really robust program.

However, assume you have a bug and you want to quickly add a println (or any IO) to you method (which would not print otherwise). IO inherently needs to modify global state (look up for Monads to learn everything what you ever wanted to know (or not) about tracking state changes). So it would need to be an additional input parameter for your method.

However, the calling method would now need to provide this input parameter and has the same problem, so the calling method of the calling method would need to provide it, too. Depending on your call stack you quickly need to modify twenty methods just to rather-not-so-quickly print a value.

This is not only true for quick debugging printlns, it is also true for all other state changing things as @lmiq mentioned. You need random numbers? Now a lot of methods need to transport the state of the random generator via input/output parameters. Want to modify files? Another parameter for a lot of your methods. Want to use a task or a process? Yet one or two more parameters for a lot of methods.

This very quickly gets to a point where most programmers realize that they are much faster in solving the problems occurring from hidden state changes than making all state transparent everywhere.

Note that there is some state where it can be faster to make it transparent, that’s why languages like Rust exist and are successful. And note that different applications have different tradeoffs which is why different programming languages exist. However, it’s mostly a question about which state changes should be transparent because they have a beneficial cost-benefit ratio and which should not because it’s not worth it rather than make everything transparent or nothing.