Explain scoping confusion to a programming beginner

That’s well expressed, and a nice mental commitment for me to remember and keep during my journey of learning Julia. I suppose I should spend more time humbly reading manuals and such, and absorbing concepts.

Yes, reading your links has definitely given me more scenarios to think about beyond the ones I’ve come across in my experience of using Matlab, which is kind of what I’ve been asking for in this post (but haven’t had the right words to ask with haha, sorry).

I deeply appreciate the comprehensive consideration of options for scoping definitions, and am grateful for the priorities you and the Julia team have in doing something that’s “right and practical” that people can get used to and realise the benefits along the way. Thanks again for your work!

3 Likes

I can give some motivation for both loops having their own scopes and for assignment to variables in loops defaulting to local rather than clobbering globals. Everyone focuses on small, easy, self-contained cases like this:

t = 0
for i = 1:10
    t += i
end

No one disagrees that in this case it’s quite obvious that the loop should update the global variable t. But that’s a really simple example, so it’s not surprising that it’s obvious what the user intended. In real world code, on the other hand, it’s quite common to see things like this:

t = 1

# do some stuff with `t`
t *= 3

include("file.jl")

# do some more stuff with `t`
t += 1

println("t = $t")

Looks fine, right? Should print t = 4. Great. But let’s suppose that this is the contents of file.jl:

for j = 1:3
    t = time()
    # something that takes a bit of time
    sleep(1)
    println("elapsed time: ", t-time())
end

The intention here is for t to be local to the for loop. This is not uncommon: people assign to variables in loop bodies all the time to use the value later in the loop without wanting it to leak outside of the loop. Now suppose that Julia’s scope behavior worked the way that people seem to think it’s obvious that it should behave and this code assigns to the global t. Now the code is broken. Instead of printing t = 4 like it should, it prints something like this:

t = 1.596037636333234e9

Oops. Note that this is very much not a hypothetical scenario. When we changed the scope rule in 1.0, it uncovered hundreds of bugs like this throughout the Julia ecosystem, many of them in Base Julia and stdlibs but also in packages and user code. This design, despite being popular in scripting languages, is really just a bug waiting to happen—it can and does bite everyone eventually. We should do better.

So what’s the fix? It’s pretty simple: loops should have their own scope and the default should be that variables are local unless otherwise indicated. And that’s exactly what Julia 1.0 does. The rule is safe and simple. There are, however, two issues with this:

  1. New users are confused by this, coming from less fastidious languages where loops don’t introduce scopes and assigning to a variable in a top-level loop pollutes the global namespace.

  2. It’s annoying to move code between function bodies and the REPL for debugging purposes because you often need add/remove global annotations on assignments in loops.

Julia 1.5 fixes both of these issues by:

  • Making this “just work” in the REPL, where it’s not so crucial to prevent bugs like the one described above;
  • Making it a warning to implicitly shadow a global variable by a local in a loop or other “soft scope”, so you have to disambiguate with local or global.

This approach retains the safety of the 1.0 design while fixing both of these problems.

13 Likes

Just a small additional comment on how it currently (post Julia 1.5) works in Jupter Notebook:

so things can be tricky sometimes.

Therefore I personally always write local k inside a loop in global scope (which additionally makes the behavior guaranteed to work the same independent of this is pre- or post- Julia 1.5).

(note that when executed in Julia REPL the second loop would not print a warning and k would be equal to 1)

3 Likes

That’s a bit weird. I wonder if it should be reported as a macro expansion bug. Might be unavoidable, but it might be fixable.

1 Like

Its unrelated to macro expansion, but rather how things like:

x = 10
for i in 1:10
    local x = (k = i)
end

are treated in REPL and Jupyter Notebook.

I filed https://github.com/JuliaLang/IJulia.jl/issues/938

5 Likes

Here’s a bug I ran into today in R:

n = 100
#
#
#
# lots of code
#
#
#
df = tibble(A = runif(100), D = runif(100))
for(n in names(df)){
    print(n)
    print(mean(df[[n]]))
}
#
#
#
#
#
x = array(NA, dim = c(n, n, 5))

This gives me the error

Error in array(NA, dim = c(n, n, 5)) : 
  negative length vectors are not allowed
In addition: Warning message:
In array(NA, dim = c(n, n, 5)) : NAs introduced by coercion

which leads me to think that somewhere n is set to be negative. In reality, n is a String because in R loop variables are in global scope.

This is a lot harder to debug than remembering to use global in certain places imo.

10 Likes

See? We didn’t make this problem up. You were lucky in this case that the value that global n ended up having caused an outright error and not just silently wrong answers. The fact that we found so many instances of this kind of bug when we changed the behavior from 0.6 to 1.0 suggests that this causing silent bugs is very common.

8 Likes