Explain scoping confusion to a programming beginner

I was typing out an example because I thought I knew where the confusion was coming from, but it turns out I don’t understand scoping either. Julia 1.4.2:

julia> module Foo
       x = 1
       println("From module Foo ", x)
       function f()
           println("From Foo.f ", x)
           println("From Foo.f after assignment ", x) # I know I haven't done any assignment here, see the next example for context
       end
       end
From module Foo 1
Main.Foo

julia> import .Foo

julia> Foo.f()
From Foo.f 1
From Foo.f after assignment 1

This works, but just add one more line and it doesn’t.

julia> module Foo
       x = 1
       println("From module Foo ", x)
       function f()
           println("From Foo.f ", x)
           x = 2
           println("From Foo.f after assignment ", x)
       end
       end
From module Foo 1
Main.Foo

julia> import .Foo

julia> Foo.f()
ERROR: UndefVarError: x not defined
Stacktrace:
 [1] f() at ./REPL[1]:5
 [2] top-level scope at REPL[3]:1
2 Likes

That is the second bullet point about x now being assigned in local scope.

1 Like

So the error is about ambiguity of x and not x being undefined?

There is no ambiguity. In your case, x is a local variable and you are trying to read it before it is defined. If you wanted to use the global x you write global x.

3 Likes

One particular fact that might not match your intuition is that a given identifier (ie variable name) can only have one meaning in a given scope block: if it’s local anywhere in a block, then it’s local everywhere in that block. You might be expecting x to be global the first time you access it and then become local after it is assigned but that cannot happen. Since x = occurs in the function body, x is local to the function both before and after that assignment.

8 Likes

And let blocks too. I use them extensively in Jupyter only because them introduce a local scope.

4 Likes

Yep. I always lump them in with function bodies in my head but yes, those too.

2 Likes

By now I understand the scoping rules I believe, and @scimas’ code makes sense to me, about Julia looking at the entire function first, instead of say Matlab which does some things statically and other things dynamically.

Speaking of Matlab, my confusion is just as to why the scoping rules for Julia are the way they are. I come from Matlab where the scoping rules make so much intuitive sense to me, and I’ve never had a problem with it. But knowing the ideologies of Julia creators, I have faith that there’s a bigger picture as the foundation for why the Julia scoping rules are the way they are.

My question was two-fold, i.e. 1. How does scoping work and 2. What is the motivation behind such rules. #1 is answered, #2 is still in the air for me, with some unclear hints to the answer.

I apologize for not being clear in my long-winded post haha!

I am not sure that is the right model to have. I think the right way to think about it is that Matlab has scripts where julia doesn’t (except, Jupyter notebooks, which can serve a similar purpose). This isn’t exactly true, but it is a close to truth to form a mental model. In julia, what look like scripts (e.g. the .jl files) aren’t. They are simply text which can be included like anything else into a session in whatever order you wish (and doing whatever you want in the middle, e.g. using the repl).

That is why in matlab in your .m files you have to manually say global on a variable, and everything else is local to the script. In julia, anything at the top level is a global.

In fact, the global keywords in the language are completely different. In matlab you declare a variable as global in your script because otherwise they are local to it, whereas in julia you provide the global in a function/loop/scope to tell the compiler you want to refer to a top-level variable rather than create a new local one.

The main place that the “everything top level is global” gets you is in in the loops. This doesn’t happen in matlab because loops don’t introduce a new scope. This is actually a major benefit of Julia for bugs, reasoning about code, etc. but has this confusing scoping downside.

In Julia <= v0.6, in Jupyter notebooks the whole time, and in the REPL julia >= 1.5, there are special scoping behavior to make this downside basically non-existent. You will find working in Jupyter entirely intuitive, and soon the same for the REPL. For jl files, you will still want to wrap things in functions… but there are plenty of other reasons to do that.

I don’t think there is any ideology here, just tradeoffs. The only ideological decision distinct from matlab is that loops/comprehensions/etc. introduce a scope - which I think is a good thing.

There are downsides in the scoping approach taken in the v0.6/jupyter/v1.5 REPL approach as well - though these typically confuse more advanced users rather than beginners. Rehashing those downsides wouldn’t be helpful, but to suffice it to say the chance in v0.7 in Julia was legitimate reasons.

It is all very confusing, especially where people come from matlab where the global keyword means something completely different. The issue is was enough that the language designers brought things back to have consistent scoping in the REPL to match jupyter.

My suggestion is simple:

  1. Use Jupyter for “scripts” where you are doing exploratory top-level code. And intend to copy/paste that code into functions eventually. In 1.5 you can do the same in the REPL as well.
  2. Put all loops/etc. inside of functions in the .jl files
  3. If you follow (1) and (2) you will never see this issue again. The scoping will be completely intuitive for normal stuff.
  4. Forget everything just discussed in this thread. There are much more important things to learn and in in practice you won’t need to think about this unless you do very advanced programming in julia.
8 Likes

If you are interested really interested in the history, here are some links:

Way back in 2012, when the language was in its infancy, there was a discussion about scoping that eventually motivated “soft” and “hard” scopes, which Julia had until 0.6:

This was by and large intuitive, but had some corner cases. Scope simplified conceptually for 1.0:

but some people still found it unintuitive:

so

was introduced to make a particular use case easier.

You will notice that a lot of thought went into scoping rules. If you want a deeper understanding, I recommend working through the code examples people posted in various discussions above (and some others you will find from there); I found it really instructive.

Generally it is not easy to define scoping rules that are intuitive (“do what I mean”), yet easy to reason about (including corner cases), especially in languages that don’t have different syntax for assignment and introducing new variables.

IMO the important thing is not whether scope behaves in a way that users from some other language will find intuitive, since Julia users come from variety of languages with different solutions to scope; but whether scoping is easy to understand and apply in practice after reading the relevant chapter in the manual. Personally, I think Julia’s current approach is a rather nice practical solution.

8 Likes

Ah that makes a lot of sense. I’m further coming to process that things that may look the same in different languages can be quite different.

That makes sense. When I mentioned ideologies, I was referring to the more general purpose of Julia, to place the greater powers of programming into the hands of those who want easier ways to implement things while getting high performance out of it, particularly for large scale projects. And that filters down to the motivation for defining scoping rules regarding loops, which I have been asking about and can accept.

That’s an interesting observation, I was unaware of that, thanks for sharing.

Haha, yeah well I have quite a few questions in my experience with Julia that I haven’t chosen to ask about. I’ve just chosen this scoping one because it seems like other people have asked about it but the answers seemed pretty involved so I decided to ask for an answer for a beginner programmer, which is what I am.

Thanks again for your help! =)

3 Likes

Sorry if I was unclear, what I meant was that the downsides that @Tamas_Papp described in his response above - which triggered the change from Julia v0.6 change - are typically issues for more advanced programmers whereas the older behavior didn’t seem to confuse new users. “A priori”
I don’t think this was obvious and it only became clear after the Julia 1.0 was released (since beginners understandably didn’t engage in the long beta/release candidate stage prior to the release).

Anyways, glad that things are starting to be more clear. What I can tell you is that other than the issue you have stumbled on, scoping in Julia is far superior to Matlab/Python in nearly every other respect - especially when it comes to the possibility of writing efficient code with fewer silent bugs.

3 Likes

Ah, thanks for clarifying that as well. I was wondering what was going on behind the scenes that you were referring to.

And yeah, I’ve started reading the discussions in the links @Tamas_Papp shared, and it’s expanding my understanding. The things available upon a Google search lacked context for me I suppose.

2 Likes

That’s well expressed, and a nice mental commitment for me to remember and keep during my journey of learning Julia. I suppose I should spend more time humbly reading manuals and such, and absorbing concepts.

Yes, reading your links has definitely given me more scenarios to think about beyond the ones I’ve come across in my experience of using Matlab, which is kind of what I’ve been asking for in this post (but haven’t had the right words to ask with haha, sorry).

I deeply appreciate the comprehensive consideration of options for scoping definitions, and am grateful for the priorities you and the Julia team have in doing something that’s “right and practical” that people can get used to and realise the benefits along the way. Thanks again for your work!

3 Likes

I can give some motivation for both loops having their own scopes and for assignment to variables in loops defaulting to local rather than clobbering globals. Everyone focuses on small, easy, self-contained cases like this:

t = 0
for i = 1:10
    t += i
end

No one disagrees that in this case it’s quite obvious that the loop should update the global variable t. But that’s a really simple example, so it’s not surprising that it’s obvious what the user intended. In real world code, on the other hand, it’s quite common to see things like this:

t = 1

# do some stuff with `t`
t *= 3

include("file.jl")

# do some more stuff with `t`
t += 1

println("t = $t")

Looks fine, right? Should print t = 4. Great. But let’s suppose that this is the contents of file.jl:

for j = 1:3
    t = time()
    # something that takes a bit of time
    sleep(1)
    println("elapsed time: ", t-time())
end

The intention here is for t to be local to the for loop. This is not uncommon: people assign to variables in loop bodies all the time to use the value later in the loop without wanting it to leak outside of the loop. Now suppose that Julia’s scope behavior worked the way that people seem to think it’s obvious that it should behave and this code assigns to the global t. Now the code is broken. Instead of printing t = 4 like it should, it prints something like this:

t = 1.596037636333234e9

Oops. Note that this is very much not a hypothetical scenario. When we changed the scope rule in 1.0, it uncovered hundreds of bugs like this throughout the Julia ecosystem, many of them in Base Julia and stdlibs but also in packages and user code. This design, despite being popular in scripting languages, is really just a bug waiting to happen—it can and does bite everyone eventually. We should do better.

So what’s the fix? It’s pretty simple: loops should have their own scope and the default should be that variables are local unless otherwise indicated. And that’s exactly what Julia 1.0 does. The rule is safe and simple. There are, however, two issues with this:

  1. New users are confused by this, coming from less fastidious languages where loops don’t introduce scopes and assigning to a variable in a top-level loop pollutes the global namespace.

  2. It’s annoying to move code between function bodies and the REPL for debugging purposes because you often need add/remove global annotations on assignments in loops.

Julia 1.5 fixes both of these issues by:

  • Making this “just work” in the REPL, where it’s not so crucial to prevent bugs like the one described above;
  • Making it a warning to implicitly shadow a global variable by a local in a loop or other “soft scope”, so you have to disambiguate with local or global.

This approach retains the safety of the 1.0 design while fixing both of these problems.

13 Likes

Just a small additional comment on how it currently (post Julia 1.5) works in Jupter Notebook:

so things can be tricky sometimes.

Therefore I personally always write local k inside a loop in global scope (which additionally makes the behavior guaranteed to work the same independent of this is pre- or post- Julia 1.5).

(note that when executed in Julia REPL the second loop would not print a warning and k would be equal to 1)

3 Likes

That’s a bit weird. I wonder if it should be reported as a macro expansion bug. Might be unavoidable, but it might be fixable.

1 Like

Its unrelated to macro expansion, but rather how things like:

x = 10
for i in 1:10
    local x = (k = i)
end

are treated in REPL and Jupyter Notebook.

I filed https://github.com/JuliaLang/IJulia.jl/issues/938

5 Likes

Here’s a bug I ran into today in R:

n = 100
#
#
#
# lots of code
#
#
#
df = tibble(A = runif(100), D = runif(100))
for(n in names(df)){
    print(n)
    print(mean(df[[n]]))
}
#
#
#
#
#
x = array(NA, dim = c(n, n, 5))

This gives me the error

Error in array(NA, dim = c(n, n, 5)) : 
  negative length vectors are not allowed
In addition: Warning message:
In array(NA, dim = c(n, n, 5)) : NAs introduced by coercion

which leads me to think that somewhere n is set to be negative. In reality, n is a String because in R loop variables are in global scope.

This is a lot harder to debug than remembering to use global in certain places imo.

10 Likes

See? We didn’t make this problem up. You were lucky in this case that the value that global n ended up having caused an outright error and not just silently wrong answers. The fact that we found so many instances of this kind of bug when we changed the behavior from 0.6 to 1.0 suggests that this causing silent bugs is very common.

8 Likes