Should a reassigning begin block execute before a variable's value is set?

A youtuber leddoo posed an interesting “bug” (in some other language). It’s hard to explain in words but it has to do with whether a value is set before or after a subsequent block executes. Commenters actually were not unanimous on whether it should be 3 or 4; leddoo opined that it should be 3.

I adapted a code example to illustrate this behavior in Julia (v1.8). Interestingly, it depends on whether the variable is global or local, and it does not matter if the begin block is changed to a let block. Anyone know how to explain or justify this discrepancy?

julia> a = 1; (a, a + begin global a+=1 end, a)  # reassign first
(2, 4, 2)
julia> a = 1; (a, a + let; global a+=1 end, a)
(2, 4, 2)
julia> let a = 1; (a, a + begin a+=1 end, a) end # reassign in order
(1, 3, 2)
julia> let a = 1; (a, a + let; a+=1 end, a) end
(1, 3, 2)

The global a behavior can be done with a function reassigning a global variable, which seems like a more common if still rare problem to run into.

julia> function aplus() global a+=1 end
aplus (generic function with 1 method)
julia> a = 1; (a, a + aplus(), a)
(2, 4, 2)

You can get around the global a references being evaluated last by doing some operation that returns a value equal to a:

julia> a = 1; (a+0, (a+0) + begin global a+=1 end, a+0)
(1, 3, 2)
julia> a = 1; (a+0, a+0 + begin global a+=1 end, a+0)
(1, 4, 2)

The 2nd one slips to a (1, 2+2, 2) because the middle expression calls +(a, 0, begin global a+=1 end). So to replicate the local a behavior, you’d have to be pretty careful to make sure there’s no global a at the same or higher level in nested function calls as the reassignments.

This seems important enough to point out in the OP. Dan found that a typed global behaves like a local on v1.9, but it still behaves like an untyped global on v1.8:

julia> a::Int = 1; (a, a + begin global a+=1 end, a) # v1.8
(2, 4, 2)
-------julia upgrade---------
julia> a::Int = 1; (a, a + begin global a+=1 end, a) # v1.9
(1, 3, 2)

Not sure what exactly you’re expecting - the printout is a tuple, which is constructed from a, something computed from a, and a again. What the global annotation in your examples does is ensure that the value that’s passed into + is updated. Said differently, your example can be written like this:

a = 1

let tupval1 = a, tupval3 = a, tupval2 = begin
        tmp = begin 
            global a = a + 1
        end
        a + tmp
    end
    (tupval1, tupval2, tupval3)
end

which just reorders the operations in terms of what the language sees. If you remove the global annotation, the assignment in the begin block doesn’t reassign the global variable a, hence the addition is just 1 + 2. If you have global there, the result of a + 1 is stored in the global variable, which then results in 2 + 2. Whether or not you use let or begin in the inner block doesn’t really matter (though in reality there is no implicit let block on the global scope).

When I remove the global annotation in your example, it throws UndefVarError: a not defined. It also gives the result (1, 4, 1), so it doesn’t seem equivalent to any of my examples.

Apologies, that’s due to order in the let I wrote there. Written like this:

julia> let tupval2 = begin
               tmp = begin 
                   global a += 1
               end
               a + tmp
           end; tupval1=a; tupval3=a
           (tupval1, tupval2, tupval3)
       end
(2, 4, 2)

you get (2,4,2).

My point is that the arguments to a function are evaluated once the call is made, (technically in the order they’re written in, but I don’t think that’s guaranteed). So things like

a = 1
a + begin global a+=1 end

can be rewritten like

a = 1
tmp1 = a
tmp2 = begin
    global a = a + 1
end
+(tmp1, tmp2)

without changing the meaning of +. That should also explain why let vs. begin doesn’t matter - it’s the global annotation that matters whether or not the global variable a is written to, or not. The outer let blocks admittedly make this a bit more diffuse, because they introduce a local scope - and assignment to such variables works a bit differently.

See also Scope of Variables · The Julia Language

Specifically, the behavior you observe with the outer let blocks is due to them introducing a hard local scope.

1 Like

This gives 4, but

this gives 3.

As for your first example, it’s basically evaluating the reassignment block first as my global a examples do. But I don’t know if the (1, 3, 2) result from a local a can be replicated without just changing the order of the let variables. That wouldn’t really explain why there is a different block evaluation order between global and local a.

No, it really does seem like it’s between global and local a. I could put the code in a local scope, but global a still gives the result (2, 4, 2):

julia> a = 1; let; (a, a + begin global a+=1 end, a) end
(2, 4, 2)

I couldn’t put the local a code in the global scope, of course.

1 Like

What puzzle me the most is the fact that in the first two examples the first and third elements of the tuple have the same value but in the last two examples the first and third elements of the tuple have distinct values. I would have expected that the two simple references to a would either (i) be evaluated after any subexpression needed for calling the tuple constructor, this is, both would have the same value because a += 1 runs before the tuple constructor is called, or (ii) be evaluated in order, because somehow the Tuple constructor breaks the rules and replaces the variables with their values in the order they appear in the argument list, before every subexpression is executed.

Another example of such ambiguity is:

julia> v = 1
1

julia> v+v
2

julia> (begin v+=1 end)+v
4

julia> v+(begin v+=1 end)
5

Which argument of + should be evaluated first?

The real question, in my opinion, is how to make Julia (or a linter) issue a warning when such ambiguity occurs (it is obviously fishy programming), and not issue a warning when “acceptable” side effects of evaluation are programmed?

1 Like

Just to add another bit of info:

julia> a::Int = 1; (a, a + begin global a+=1 end, a)
(1, 3, 2)

which shows the ordering effect is the result of globals being untype-stable and boxing of variables messing with optimization and reordering of code.

1 Like

It doesn’t seem like they work like references. They are immutable after all, so their values can be copied. If you want a reference, this works:

julia> a = [1]; (a, a + begin global a[1]+=1; a end, a)
([2], [4], [2])
julia> let a = [1]; (a, a + begin a[1]+=1; a end, a) end
([2], [4], [2])

That’s basically what I’m asking here. It’s like order of operations but very weird because function calls don’t have access to variables like this, let alone reassign them. On top of that, local and global variables seem to play by different rules.

v1.8 is different!!

julia> a::Int = 1; (a, a + begin global a+=1 end, a)
(2, 4, 2)

The inlining aggressiveness might have been turned up (inlining causes 1-3-2).

But, again, the goal would be to somehow warn about this code.

Let’s take a look at what lowering has to say about the examplesm, rather than my poor attempt at expanding the code in my head. This is what the compiler/interpreter/the REPL sees:

julia> Meta.@lower (a, a + begin global a+=1 end, a)
:($(Expr(:thunk, CodeInfo(
    @ none within `top-level scope`
1 ─ %1  = a
│   %2  = a
│   @ REPL[1]:1 within `top-level scope`
│         global a
│   %4  = a + 1
│   %5  = Core.get_binding_type(Main, :a)
│         #s1 = %4
│   %7  = #s1 isa %5
└──       goto #3 if not %7
2 ─       goto #4
3 ─       #s1 = Base.convert(%5, #s1)
4 ┄       a = #s1
│   %12 = %2 + %4
│   %13 = Core.tuple(%1, %12, a)
└──       return %13
))))

julia> Meta.@lower (a, a + let; global a+=1 end, a)
:($(Expr(:thunk, CodeInfo(
    @ none within `top-level scope`
1 ─ %1  = a
│   %2  = a
│   @ REPL[2]:1 within `top-level scope`
│         global a
│   %4  = a + 1
│   %5  = Core.get_binding_type(Main, :a)
│         #s1 = %4
│   %7  = #s1 isa %5
└──       goto #3 if not %7
2 ─       goto #4
3 ─       #s1 = Base.convert(%5, #s1)
4 ┄       a = #s1
│   %12 = %2 + %4
│   %13 = Core.tuple(%1, %12, a)
└──       return %13
))))

In these examples, reading a is always a look up in global scope and assigning to a is a write to global scope. You can see in the first example that the two elements of the tuple really, truly, are reading from a first. The second example is the same; after all, the let itself doesn’t introduce variable bindings.

Now for the casaes with outer let:

julia> Meta.@lower let a = 1; (a, a + begin a+=1 end, a) end
:($(Expr(:thunk, CodeInfo(
    @ none within `top-level scope`
1 ─ %1 = 1
│        a = %1
│   @ REPL[3]:1 within `top-level scope`
│   %3 = a
│   %4 = a
│   %5 = a + 1
│        a = %5
│   %7 = %4 + %5
│   %8 = Core.tuple(%3, %7, a)
└──      return %8
))))

julia> Meta.@lower let a = 1; (a, a + let; a+=1 end, a) end
:($(Expr(:thunk, CodeInfo(
    @ none within `top-level scope`
1 ─ %1 = 1
│        a = %1
│   @ REPL[4]:1 within `top-level scope`
│   %3 = a
│   %4 = a
│   %5 = a + 1
│        a = %5
│   %7 = %4 + %5
│   %8 = Core.tuple(%3, %7, a)
└──      return %8
))))

Here, each read from a is a read of the local variable a of the outer local scope, which is then acted fed into +. It’s no different if we were to write a = a + 1; all a are referring to the local variable. It’s just that the various reads happen either before, or after the “middle element” of the tuple is computed.

This really seems to be a bug in Julia. The behavior of such examples should not have changed since 1.8 as there was no breaking version number change since then.

By references I meant bindings, all a in those examples reference the same binding, and happen inside the argument list of a call to the tuple constructor, in most languages I believe any expressions in the arguments are always evaluated before the call and after the expression evaluation the value of the binding a should be 2, so the final tuple should behave like a had the same value in all references to it in the call. Or is it the Julia’s semantics that such an example has ambiguous behaviour?

1 Like

That’s impossible - how would

a = 1
(global a += 1, global a += 2, a)
println(a)

behave? Which write happens first/is the “blessed” one? Do the reads & writes influence each other? You need some form of order of evaluation, and though the exact order is not defined, “left to right” is a valid one.

1 Like

(global a += 1, global a += 2, a) throws an error, and ((global a += 1), (global a += 2), a) gives (2, 4, 4) on v1.8 (could’ve written those parentheses as begin blocks and vice versa in the OP). Basically the same as the (1, 3, 2) result in the first post, the tuple setindexing (weird to call it this but these examples work on vectors too) and local a reassignments occur in left-to-right order.

I think Henrique is bothered that the behavior was not left-to-right in the global scope, where the begin block runs before any of the global a are evaluated. Your example actually gets around that behavior by the expressions returning the value of a, rather than a reference to a. I can do the same thing, and I’ll actually update OP with this:

julia> a = 1; (a+0, (a+0) + begin global a+=1 end, a+0)
(1, 3, 2)

And Dan has pointed out that a typed global a::Int=1 behaves like the local a in v1.9 but untyped global a in v1.8. I wouldn’t call it a bug because AFAIK nobody has ever documented the order of blocked reassignments in 1 expression, so it seems more like unspecified behavior. Not really pressed for anyone to figure this out, this isn’t like Python needing to make a walrus operator to do assignments in one-liner if headers. Despite the capability for one-liner assignments, I much prefer compound expressions (begin/let) to clarify order.

1 Like

In your new example, you have two a += c and only one a, and a += c is the same as a = a + c (check += docstring) and the behavior of = when used as an expression is to return the value of the right side (a + c) at the moment it was evaluated (check = docstring). This can be seen very explicitly in the example:

  julia> a, b = 1:3
  1:3
  
  julia> a, b
  (1, 2)

This is, the result of an assignment expression is not a binding to the variable assigned, but the full value of the right side the same can be seen in:

julia> a :: Int = 0
0

julia> a = 1.0
1.0

julia> a
1

Clearly, a never had the 1.0 value, this is impossible because of its type, however, the value of the assignment as an expression is 1.0. So there is no relation to the value of an assignment and the value of the binding in the left-hand side.

Now, let me clarify my point, and why the above is important. The above is important because this means an example with two two a += c does not mean much to me. Yes, the expressions must be evaluated in some order. Yes, I do agree left-to-right is a valid one, maybe even the most intuitive one. But the two expressions are returning some value, they are not bindings to a. If the tuple constructor should follow the rules of every other constructor/“function call” in most languages, then any nested calls/expressions inside the argument list of the call must be evaluated first, and then the call must be made with any bindings or values provided. The inconsistency here is that:

  • (i) sometimes binding a is seen as a single expression that evaluates to its current value, and then the compiles goes on to the next argument of the argument list (left-to-right or whatever) and evaluate it completely, and so on;
  • (ii) sometimes binding a is seen as a binding, that will evaluate to its value just before the call and just after all the expressions in every argument of the list have been evaluated, this is: it is guaranteed to have the last value that a had in that scope before the call is made. The behavior of a alone is interesting to me; the behaviour of an attribution involving = is already clearly specified to return the value of the right side, even if it is a value a never really stored.

I may be wrong, this would be interesting to check, but what is intuitive to me, and what I remember from other languages, is taking option (ii) above, this is: sole bindings in the argument list of a method call only evaluate to their value at the last possible moment and therefore always deliver the latest value the variable/binding had in that scope before entering the scope of the call. This is, just a is not an expression to be evaluated to the current value of the variable/binding at some point before the call (which may happen before of an assignment expression that changes it to another value just in the next argument slot, before the call), but it is instead an indicative that the argument slot is guaranteed to receive the latest value that variable had in that scope just before the call.

In my honest opinion, this should be either defined (either to the sequence of evaluations or the guarantee of last value in scope semantics) or made into an error (in this case we would need to check exactly what should be disallowed here). Simple assignments are already disallowed inside argument lists because they are interpreted as keyword arguments (or names for a named tuple).

3 Likes

I mentioned this before but this isn’t specific to Tuple constructors

julia> a = 1; [a, a + begin global a+=1 end, a]
3-element Vector{Int64}:
 2
 4
 2
julia> a = 1; (x = a, y = a + begin global a+=1 end, z = a)
(x = 2, y = 4, z = 2)
julia> struct Blah x::Int; y::Int; z::Int; end
julia> a = 1; Blah(a, a + begin global a+=1 end, a)
Blah(2, 4, 2)

If this behavior is ever changed and documented, I would prefer a left-to-right evaluation for consistency, (1, 3, 2) in the OP.

Yes, I feared that it would be common to all those cases, this is the reason I said “in most languages” not “in other Julia constructs”.

My argument in favor of the other option (guarantee that sole bindings always deliver the last value that variable had in the caller scope) is, unfortunately, one based in intuitiveness (subjective) and tradition (not a great reason except preventing headaches of programmers from other languages). This is how C behaves:

#include <stdlib.h>
#include <stdio.h>

void f(int a, int b, int c, int d)
{
  printf("%d %d %d %d\n", a, b, c, d);
}

int main (int argc, char** argv)
{
  int a = 1;

  f(a, a += 1, a += 1, a);

  return EXIT_SUCCESS;
}
$ gcc -std=c99 -Wpedantic template.c 
$ ./a.out 
3 3 3 3

Interesting. The youtuber leddoo points out that Python does left to right.

def adda():
    global a
    a += 1
    return a
a = 1; print(a, a + adda(), a) # 1 3 2
# Julia equivalent prints 242

I don’t know C but I tinkered with your C code and threw it into a few online compiler sites. The lines I added are almost consistent except one function call returned a 1 at the very end instead of a 3. Can you replicate that? I can’t spot if I just wrote something wrong.

edited C code
#include <stdlib.h>
#include <stdio.h>

void f(int a, int b, int c, int d)
{
  printf("%d %d %d %d\n", a, b, c, d);
}

int myadd(int a, int b)
{
    return a+b;
}

int main (int argc, char** argv)
{
  int a = 1;

  f(a, a += 1, a += 1, a);
  a = 1;
  f(a+0, a += 1, a += 1, a+0);
  a = 1;
  f(myadd(a, 0), a += 1, a += 1, myadd(a, 0));
  return EXIT_SUCCESS;
}

The point I was trying to make is that the assignments must happen in some order - you can have the reads mean the same thing, sure, but the result of printing a afterwards has to choose some version; you can’t have both values being true.

I am not entirely sure if I understand your point. My C example has two a += 1 and both return 3, however, if we require some order of evaluation, it would make sense that one of them return 2 (the first to execute), no?