I noticed that @A_Dent kept saying “closure capture the value”, but just to make sure, closures captures variables, not the objects, right (the docs seem to say so)? If they captured objects, it doesn’t seem like you even need scopes and new variables each iteration to avoid the repeated value bug, you could just inline the integers into the function.
Now that I think of it, why don’t closures capture values instead of variables?
It’s a good question and I agree it that on the surface it seems like it’d make things simpler, but it’s actually an exception in how scopes generally work. The crux is that you want all of these constructs to behave the same:
julia> function f1()
x = 1
for i in 1:10
x = x+1
end
return x
end
f1 (generic function with 1 method)
julia> function f2()
x = 1
[(x = x+1) for i in 1:10] # surprise, this is a closure!
return x
end
f2 (generic function with 1 method)
julia> function f3()
x = 1
map(i->(x = x+1), 1:10)
return x
end
f3 (generic function with 1 method)
julia> function f4()
x = 1
g(i) = (x = x+1)
map(g, 1:10)
return x
end
f4 (generic function with 1 method)
julia> f1() == f2() == f3() == f4() == 11
true
Indeed, if closures — and functions generally — captured by value then you’d no longer be able to use non-constant globals! The way I keep my mental model straight is by remembering that closures aren’t special. If they share a name with an outer scope, then that name is consistently used. In the example above, I’m simultaneously using the value of a name and changing what that name should identify, but you could separate the two:
julia> function f5()
x = 1
g = () -> x+1
for i in 1:10
x = g()
end
return x
end
f5 (generic function with 1 method)
julia> f5()
11
That’s a design direction to think about. If we accept that we want loop scopes and closure scopes to behave similarly so that we can freely move back and forth between for
and foreach
and comprehensions (which are secretly closures), then let’s think through the consequences. Suppose we have this for loop:
t = 0
for i = 1:10
t += i
end
Based on the equivalence assumption, this should behave the same way as this:
t = 0
foreach(1:10) do i
t += i
end
Now, if closures capture variables by value, then t
in the closure is a new t
that is assigned to the initial value of the outer t
value. That means that the outer t
is unaffected by the assignment in the closure / loop body, so the effect of the entire for
loop or foreach
call is nothing: the outer t
is never changed and at the end it’s still zero.
It seems pretty clear that the vast majority of potential users would not be happy with that. So either this approach is a dead end or we decide to break the assumption that loops and closures behave similarly, in which case the for
loop version could modify t
while the foreach
version could leave it unchanged. Personally, I think that a lot of the utility of closures is that they allow you to do things like implement a for loop with user-defined code, so I’m very reluctant to break that kind of principle.
So in short, closures (like any top-level block) capture variables because that’s the only way they can make any changes to those variables. The only case where it seems easier to capture values is when the value of that variable never changes, like for i = 1:5 push!(fns, () -> i) end
.
Right. That’s an interesting observation: capture by value does generally seem more convenient/intuitive when you only want to read a values from an outer scope inside a closure; when you want to modify a value outside a closure, then capture by value doesn’t work, since you can’t modify the outer binding. The motivations for small scopes generally only come from the former use case (reading but not writing outer variables).
Something I don’t like about the implicit capturing of variables in outer scopes is that it allows for this kind of bugs to slip in:
function check_computation()
function computation(a, b)
# ... some difficult computations ...
result = a + b + 1 # bug: we were meant to return `a + b`
# ... some other computations
return result
end
a, b = 10, 20
result = a + b # the expected result, computed in some independent way to double-check
computed = computation(a, b)
if result == computed
print("They agree:\nexpected = $result\ncomputed = $computed")
else
print("They disagree:\nexpected = $result\ncomputed = $computed")
end
end
I think that at first it may be surprising to realize that check_computation()
prints the erroneous
They agree:
expected = 31
computed = 31
and does not detect the bug in computation
. The behavior would be different if computation
were defined outside the scope of check_computation
: in that case the bug would be detected with the output
They disagree:
expected = 30
computed = 31
This makes it very hard, almost impractical, to refactor and move functions in and out of other functions scopes without introducing subtle bugs. The only safe way I can see is to declare local
every variable of an inner function (if we don’t want to intentionally capture it, of course).
The only safe way I can see is to declare
local
Yes, but what’s the point? Evidently you’re nesting a function because you want/need enclosing scope for some reason, which means you also accept the risk of accidentally capturing unintended variables.
There seem a couple ways around this. First, you could and should just say local
just as you say. Second, you could elect not to nest the function, and instead have a separate computation
that receives all its information as explicit arguments. That would be safe, just a bit annoying, although you could just define a closure in check_computation
.
Nobody forces you to use nested functions, and when you decide to, unfortunately you have to accept the risks. I generally avoid nesting except for very short functions, preferably anonymous, where there aren’t a lot of extra (implicitly local) variables floating around, and it’s clear that the intent is some sort of closure.
Yes, the “reason” of course is that the scope is used to control the visibility of the function. That’s the whole purpose of scopes, so I’m not abusing it in any way. In some cases it may make sense conceptually to define a function inside another.
Whether to put something in a scope or another should be determined by where you want that something to be visible. It is not a universal necessity that putting it inside another function should be associated with some risk. It is just an unfortunate fact in Julia: the scope in which you define a function changes its meaning.
- In Python, variables in the outer scopes are visible in the inner, but you cannot assign to them unless you use the qualifier
nonlocal
, which nicely grabs the attention to the fact that something fishy is going on (the inner function has turned into a closure and is capturing some environment). - In Rust for there is no risk whatsoever. Inner functions do not secretly capture outer variables, so it is completely safe to move functions around wherever you please and the scope where you put them in only determines their visibility, not their meaning. If you want to capture some outer variable, there is a dedicated syntax for closures. (example)
Maybe a dedicated syntax for distinguishing functions from capturing closures would reduce the risks currently associated with defining inner functions:
function outer_func(...)
x = 0
...
closure inner_closure(...)
x = 1 # can capture x
end
end
That’s close to the idea I reached in the middle of the thread, but I’m not sure if it’s worth the hassle:
-
I considered the “can use but not assign outer scope variables” idea upthread, too, but that carries its own drawback. In a Python equivalent of your example, the inner
computation
assignsresult
without accessing the outercheck_computation
’sresult
. But what if it didresult += 1
? Python raises aUnboundLocalError: local variable 'result' referenced before assignment
when you callcheck_computation
. If we meanresult (local) = result (outer) + 1
, we’re out of luck, see (3). If we mean to reassign the outerresult
, we can writenonlocal result
, but see (2) -
To deal with (1), we’d have to edit a LOT of keywords
nonlocal
/outer
/closure
into our code because it’s not just functions that introduce new scopes. It’d absolutely butcher one-line statements and deeply nested scopes that reassign a variable in every level. People were already so annoyed by the comparatively easier need to writeglobal
when pasting code from inside a function to the REPL, they ended up tweaking the REPL’s global scope behavior. -
It’s a lot easier to just sidestep this issue instead of finding some scope rule everybody can be happy with. If your function doesn’t need to access any sort of outer variable, don’t nest it at all. If your function needs to access an outer variable
x
but doesn’t want to modify it, make a variable with a different namex2 = x
like you can in Python. Besides, it’s much more readable: we knowx2
isn’t the same variable asx
no matter where they are.
If the desire is information hiding and not capturing outer-scope variables, there are ways to do that. First, you could put your computation
in its own module and pass information explicitly. Modules are semi-private, and information is hidden in the sense that nobody will accidentally call PrivateModule.computation
. Second, it should be relatively straightforward to make a macro like @private function computation
that puts a local
before every identifier inside computation
.
I am not convinced that the “whole purpose” for scope is to control visibility. At least as early as Algol and Lisp, nested procedures could access information from encapsulating scopes. It makes sense for Rust to disallow that, since it is meant to be a secure language more so than a convenient one. I’m under the impression that Julia is intended to be fun and quick, with less boilerplate than private static void
. Also, it is not object-oriented, where one might traditionally encapsulate a bunch of methods within a class. Julia puts hierarchy under modules, which should control visibility, just differently from Rust.