Julia's loop scope behavior differs from Fortran?

Hi, the oddest thing about Julia is how it behaves in loops (especially For loops).
The following example gives me an error, DataVar: Not Defined, and I cannot imagine why:

function TestLogic(Nloops)
    for i in 1:Nloops
        println("i, Nloops = ",i,", ",Nloops)
        if (i == 1)
            DataVar = 333
        end
        println("i = ", i)
        println("DataVar = ", DataVar)
        DataVar += 10^i
        println("DataVar = ", DataVar)
    end
    return
end

TestLogic(3)

As an old Fortran programmer, I expect the variable DataVar to be defined during the i=1 pass, and incremented each time afterwards. But why does the For loop forget about DataVar completely??

I am aware of the “controversy” about For loops not knowing about variables defined outside of the loop. But here, DataVar is being defined INSIDE the loop! What’s going on??

1 Like

DataVar is only defined for i == 1? It would be better IMHO to do

julia> function TestLogic(Nloops)
           DataVar = 333
           for i in 1:Nloops
               @show i
               @show DataVar
               DataVar += 10^i
               @show DataVar
           end
           nothing
       end
TestLogic (generic function with 1 method)

julia> TestLogic(3)
i = 1
DataVar = 333
DataVar = 343
i = 2
DataVar = 343
DataVar = 443
i = 3
DataVar = 443
DataVar = 1443
4 Likes

The variable scope is each iteration of the for loop, not all iterations of the loop together.

If you want to share the variable for all iterations of the for loop, you could also (as alternative to the suggestion of @briochemc) write

local DataVar # this does not define the type of DataVar yet
for i in 1:Nloops
    if i == 1
        DataVar = 333
    end
    DataVar += 10^i
    # ...
end

Edit: It is still required to set the 1st value of DataVar, code corrected.

6 Likes

Guys, thanks for the suggestions, but in my real program DataVar is an extremely compute-intensive quantity which must first be calculated during the i=1 pass, and then every subsequent (i+1)th pass must have available the value left behind by the calculation in the previous ith pass. It cannot be specified before the i=1 trip.

In FORTRAN (and in general), the i=2 pass
would know about the definition made in the I=1 pass. I don’t understand why this isn’t true in Julia, or how to fix it.

Lungben, would your “local DataVar” suggestion make the loop work as written? If not, does anyone know how to make each pass through a For loop know about the variables from the previous pass? Thanks!

I think yes. However, I had an error in my code above, it is corrected now.

@CosmoProf Could you post a minimal Fortran example? From my hazy memory, variables in Fortran 95 are typically declared first (with a type), and in the examples I found online the variables are always declared before the loop.

The type of code you gave would also not work in compiled languages I know such as C, C++, Java and Go…

I think you are looking for foldl then, not a loop.

6 Likes

Yes, it does. The analogous to a Fortran code would be to declare every variable as local in the first lines of the function. Once you get used to the fact that you don’t need that anymore that behavior will start to make more sense. Happened to me, former Fortran programmer as well.

1 Like

@lungben’s solution does what you want then, no?

julia> function TestLogic(Nloops)
           local DataVar
           for i in 1:Nloops
               @show i
               (i==1) && (DataVar = 333)
               @show DataVar
               DataVar += 10^i
               @show DataVar
           end
           nothing
       end
TestLogic (generic function with 1 method)

julia> TestLogic(3)
i = 1
DataVar = 333
DataVar = 343
i = 2
DataVar = 343
DataVar = 443
i = 3
DataVar = 443
DataVar = 1443

Please ignore this, post, the benchmark was wrong because I likely forgot to add -O3 when compiling the fortran code.

I will add some information here for Fortran programmers and, if anyone with a deeper knowledge of the inner workings of the compilers can correct something, please do.

Let us consider this simple function:

function loop(x,n)
  s = 0.
  for i in 1:n
    r = sqrt(i)
    s = s + x*r
  end
  s
end

Which would have the equivalent Fortran code:

double precision function loop(x,n)
  implicit none
  integer :: i, n
  double precision :: x, r
  loop = 0.
  do i = 1, n
    r = dsqrt(dble(i))
    loop = loop + x*r
  end do
end function loop

From the perspective of this thread, the difference is that the variable r in the Julia code is only known locally at each iteration of the loop, while, at least from a synthatic point of view (I won’t speculate on what the compiler actually does), the variable is visible everywhere in the Fortran function.

The fact that the r variable is visible only locally in the Julia loop allows further compiler optimizations than in the Fortran code, apparently. Effectively, the benchmark of these functions if very favorable to the to Julia version:

julia> x = rand() ; n = 10_000_000;

julia> @btime loop($x,$n)
  15.039 ms (0 allocations: 0 bytes)
6.425913494015874e9

julia> @btime ccall((:loop_,"loop.so"),Float64,(Ref{Float64},Ref{Int64}),$x,$n)
  87.733 ms (0 allocations: 0 bytes)
6.425913494015874e9

(the overhead of calling is negligible, and that is easy to test using a small n)

The Fortran version was compiled with:

gfortran -O3 loop.f90 -shared -o loop.so

Evidently that a different compiler could perhaps figure out that r does not need to be visible outside the loop and do a better job in the compilation of the Fortran code. But, in general, the more restricted scope of variables in Julia allows a better performance than the naive translations of the equivalent codes in Fortran.

At least this is has been my experience and these scoping differences are the only reason I can imagine to explain those performance advantages of Julia.

2 Likes

The later compiler stages are smart enough to understand that there is no difference between

function loop(x,n)
  s = 0.
  for i in 1:n
    r = sqrt(i)
    s = s + x*r
  end
  s
end

function loop2(x,n)
  s = 0.
  r = -1.0
  for i in 1:n
    r = sqrt(i)
    s = s + x*r
  end
  s
end

LLVM mem2reg munches that for breakfast. Heck, scoping info is thrown away in @code_lowered already! So, what is the reason for restricting the scope of loop-local variables?

As @StefanKarpinski recently quipped, most scoping related language design decisions in julia are for closures. My addition to this would be “…because compiler support for closures is suboptimal, for very technical implementation-specific reasons” [*].

The code is transformed from expression-tree into SSA-form during lowering. However, closures are emitted and partially optimized in a piece of femtolisp before this happens, operating on the expression-tree. The optimizations in this phase are very brittle and must operate with very impoverished information (e.g. no types, no dom-tree).

So I’d guess that this scoping decision was made in order to simplify that code’s job.

[*] I’m not trying to throw shade here. Jointly optimizing closure layout and code operating on the closures is super hard! And we need to do it in an open-world context, where users can later define new functions operating on the closures. Most compiler techniques optimize code only, and require users to manually optimize data structures.

3 Likes

Yes, that is clear. Does that mean that gfortran is just too bad?

1 Like

It’s documented by the phrase “as if the loop body were surrounded by a let block”, but the examples don’t make it as crystal clear as this simplified version of your method:

function f()
for i in 1:3 # new i every iteration
   if i == 2
      x = 3 # x will only exist at i == 2
   end
   println(@isdefined x) # nifty macro to see if variable exists
end
end
f() # false true false

Like the others said, if you only want to do the computation in one of the iterations but want the variable to survive its iteration, you can make the variable before the loop without doing the computation yet. But mind that this works in a function body and the REPL’s global scope but not a file’s global scope.

1 Like

I need to take some time to check if the documentation would not benefit from a gigantic red box pointing out this for loop scope property. It is not the first question this comes up. And it is also important if you are dealing with closures inside a for loop too. There is a section of the manual that points out this scope property and also that you may reuse an existing variable as iterator with the keyword outer, but maybe it is not very noticeable.

The fundamental problem is that every red box will just increase the information load of the reader, at the expense of other parts. Just like you cannot make a lot of things bold in some text.

I think that having something in the manual is enough, and the implicit expectation is that the whole manual should just be read at some point by a user investing into learning Julia.

2 Likes

I think we already have this discussion before Tamas. Every time I suggest emphasizing/explaining anything you come with the argument this will be overdone and therefore negate any beneficial effects. To what I answer I trust the Julia maintainers to take the decision of accepting or not such changes; avoiding this to be overdone while keeping the cases in which it is beneficial. At this point we save could some effort and agree in having generic replies we copy in each other posts every time this happens.

3 Likes

I agree that readability suffers from bolding and redboxing every little gotcha someone can have coming from another language, but I would say that “the whole manual should just be read” isn’t actually enough. It definitely should be done, but the manual is kinda big. People who have read the manual may fail to pick up or remember some details, and it’s nice to look up a small section quickly.

Fortunately for this topic, you can. Type “loop” into the documentation page’s search bar and the top result IS the section on loop scope rules (oddly the section introducing loops is 3rd). Even the section introducing loops notes the scoping rule and links the scope rules page. So there has been a sufficient effort to let people reach this particular topic in the manual with minimal effort. Personally I think one of the section’s examples can be tweaked to hammer home this topic’s point, but the preceding let section is clear enough.

Really, I suspect a common problem is that people don’t know they can often use the documentation’s search bar to find what they need. Google is often no help because you reach discourse and stackoverflow questions instead of the right manual section, and if the questions are too specific to their poster’s use cases, it easily causes people to believe they need to post their own question.

3 Likes

Scope behavior decisions are made entirely for the sake of user-facing semantics, not at all for the sake of the compiler. If we wanted something that was easy on the compiler, we wouldn’t have closures at all or would have closures that capture by value.

5 Likes

What I take issue with the following:

  1. someone complaining that X is not documented,

  2. when the learn that it is, convert the complaint to not having it prominent enough in the manual.

I think the fundamental problem is people not reading the manual (multiple times, coming back to parts, as they would read a textbook on some nontrivial subject).

Making parts of it more prominent is unlikely to help this; the manual is not for linear reading like a paperback novel. Yet it is suggested like some panacea every time.

OTOH, what the manual could benefit from greatly is rewriting chapters that are very old and had a lot of parts tagged on as the language grew. This often requires nothing more than just careful editing and reviewing issues people run into — I have done this in #38271 and plan to do more.

5 Likes

Exactly. Put simply, no edit to the manual will help if people don’t know how to look up sections, just like with real books. You can put flashing neon lights on the section and it wouldn’t help if the reader hasn’t reached the page.
That’s not to say this can’t be improved ever. I just remembered this topic where it turned out typing == and === in the search bar didn’t return the right sections in Base or Manual.