Why won't julia ERROR if a variable is not defined when scheduling?

I’m very confused by this behavior:

# start a new julia REPL
julia> function work(b)
           sleep(3)
           println(b)
       end;

julia> function main()
           t = Task(() -> work(c))
           schedule(t)
           c = 9
           wait(t)
       end;

julia> main()
9

You see, c is not defined when schedule(t). I expect an ERROR when schedule(t), am I right?


This comparative test is even more puzzling

julia> function work(b)
           sleep(3)
           println(b)
       end;

julia> function main()
           t = Task(() -> work(c))
           schedule(t)
           println("I merely insert one line here")
           c = 9
           wait(t)
       end;

julia> main()
I merely insert one line here
ERROR: TaskFailedException
Stacktrace:
 [1] wait(t::Task)
   @ Base ./task.jl:370
 [2] main()
   @ Main ./REPL[2]:6
 [3] top-level scope
   @ REPL[3]:1

    nested task error: UndefVarError: `c` not defined in local scope
    Suggestion: check for an assignment to a local variable that shadows a global of the same name.
    Stacktrace:
     [1] (::var"#1#2")()
       @ Main ./REPL[2]:2

julia> Threads.nthreads(:default), Threads.nthreads(:interactive)
(255, 1)

julia> versioninfo()
Julia Version 1.11.6
1 Like

There are no promises about when () -> work(c) will be called. Doing schedule(t) only schedules the task but it might not actually start for a while, so whether or not it crashes depends on whether it runs before or after c gets defined.

In your first case, c is defined immediately after schedule(t), so will usually be defined before the task actually starts, so no error occurs.

In the second case, doing some IO usually ends up switching between tasks because IO is slow - it can do some other work while the IO is happening. This means that the task actually got started before c was defined, hence the error.

4 Likes

The schedule(t) only does a push!(workqueue, t) and returns, where workqueue is an internal queue of tasks belonging to the scheduler. The next time the scheduler activates the task is found and run. This typically happens when the current task enters some wait state, like I/O, sleep, yield, wait, lock or similar (It’s really not specified when this can happen, I think the doc states that it can happen at any time).

Anyway, when a task throws an error, the exception is just stored in its task struct before it terminates, and rethrown when a fetch or wait is performed. So you will not see it until a wait or fetch.

2 Likes

My meaning was, the arg to the work function should be decided when the task t is scheduled. Otherwise it’s super weird—I have no idea what c really is, i.e. I lost my mind about what I write becomes in the end.

For instance

task_vec = []
for i = [10, 20, 30]
    for j = i+1:i+5
        t = Task(() -> work(j))
        schedule(t)
        push!(task_vec, t)
        println("i = $i, j = $j")
    end
end

If j is cut-and-dried when schedule(t), then I really know what my code is doing about.
Otherwise, when the task defined at i = 10 is started when i = 20, then the previous j is lost, no?

There is also the additional point that you’re (sometimes) allowed to reference variables before (in terms of location in code) they are declared, which is not related to multi-threading. E.g.

julia> function main()
           inc_c() = c += 1  # What is c at this point?
           c = 1
           inc_c()
           println(c)
       end;

julia> main()
2

(See also Is it okay that, in Julia, code that never executes can change the result of programs? - #21 by frankwswang)

1 Like

From experience I’d say all variables are implicitly declared at the start of the narrowest enclosing scope.

2 Likes

c is a local variable in main. The closure () -> work(c) captures c. In Julia, capturing a variable does not make an implicit copy. The time at which a schedule(task) call happens is not relevant.

2 Likes

This is a different situation. The task contains the closure () -> work(j) where j is assigned anew for every iteration. This closure is really a callable immutable struct with one field j which is set when the closure is created. Like

struct uniquename
    j::Int
end
(s::uniquename)() = work(s.j)
...
t = Task(uniquename(j))

In the former case, the c becomes a untyped “boxed” variable, which initially is undefined, but which can change later (which it does). If it happens to be defined before it’s used in the task, everything will be fine.

1 Like

What you can do is create a new task using j, instead of capturing j. For example:

let j = j
    Task(() -> work(j))
end

This still captures a j, but not the (outer) j. And the capture is not mutable/boxed, because the (inner) j is clearly never mutated.

1 Like

If you do a @code_warntype on your original main, you’ll see the boxing of the c variable:

ulia> @code_warntype main()
MethodInstance for main()
  from main() @ Main REPL[47]:1
Arguments
  #self#::Core.Const(Main.main)
Locals
  #35::var"#35#36"
  c::Core.Box
  t::Task
...
1 Like

Great, now who’s going to explain why boxing happens :laughing:

Basically, Julia’s frontend boxes all local variables that get captured by a closure and may get reassigned. In other words, to be able to support reassignment of local variables that get captured, Julia’s frontend changes the type of the local variable to a mutable type with a single mutable field, so, thereafter, Julia just sees mutation, not reassignment.

Somewhat discussed in this issue:

Probably someone should make a doc PR.

I can somewhat understand what your code is doing. But I cannot get your overall meaning. (Since I haven’t write any packages, I have almost zero experience about Struct, Module etc. Because as a user I don’t need to define them by myself).

Some context to what @sgaure is saying:

  • the built-in syntax for functions in Julia is just syntax sugar for:

    • defining a new type, if it’s a new function

    • adding a method to the type making it callable

  • in Julia, any type can be callable, just add a method

  • in Julia, new types can be struct or mutable struct (or abstract type, but obviously those don’t have instances, so they are not relevant here)

TBH I’m not completely sure what @sgaure’s point was either. I think they’re saying that some closures get lowered to a mutable struct, but I’m not sure if that’s true.

Well, you made me discover something interesting.

I though for and let are somewhat alike since they both introduce a local scope.

But now I believe let is simpler thus superior, because it has no Union that for has:

julia> function test_local_scope()
           let k = 1
           end
       end;

julia> @code_warntype test_local_scope()

Locals
  k::Int64

julia> function test_local_scope()
           for k = 1
           end
       end;

julia> @code_warntype test_local_scope()

Locals
  @_2::Union{Nothing, Tuple{Int64, Nothing}}

As you can see, there is a yellow-color Union associated with for, which is not the case with let.

The intent of for is for repeated execution of code. The intent of let is to create a scope. Make your code more readable and maintainable by using the language constructs for their intended purpose, rather than for their side effects. Seeing a for loop abused as you do above will likely make most readers wonder at the point of it.

1 Like

You’re right.

But my question is still

function work(b)
    sleep(2)
    println(b)
end

for i = [10, 20, 30]
    for j = i+1:i+4
        t = Task(() -> work(j))
        schedule(t)
    end
    sleep(1)
end

I “spawned” 3 * 4 tasks here.
The first 3 tasks are scheduled when i = 10, surely.
But the first 3 tasks might be executed at the time instant when i == 20 or i == 30, therefore the j is no longer the old ones, e.g. j is 21, 22, 23, 24 now.
Will the first 3 tasks still use the old j’s, i.e. 11, 12, 13, 14? Why is this guaranteed?


Imagine it is expanded as

i = 10
for j = [11, 12, 13, 14]
    body1(j)
end
i = 20
for j = [21, 22, 23, 24]
    body2(j)
end
i = 30
for j = [31, 32, 33, 34]
    body3(j)
end

Is the j serving as the arg to body1 the same thing as the j serving as the arg to body2?
I’m really confused.

I can’t find it in the docs right now, but the loop variable is bound anew for every iteration, i.e.

for i in 1:N
    ...
end

is just like

for i in 1:N
    let i=i
        ...
    end
end

Btw, the reason a for loop always involves a union with Nothing is because it’s actually rewritten:

for item in iter   # or  "for item = iter"
    # body
end

is translated into:

next = iterate(iter)
while next !== nothing
    (item, state) = next
    # body
    next = iterate(iter, state)
end

And iterate will return nothing when iteration is finished.

2 Likes

Thanks, this description is very clear and reassuring.

It would be better if anyone could share a link where this statement is proved.

I have a proposition, based on the discussions above

  • the following line 1 and line 2 are completely equivalent operations
... some background
v = rand(8)
i = 7
for j = view(v, 1:i) Threads.@spawn work(j) end # line 1
for ȷ = i:i    Threads.@spawn work($(v[ȷ])) end # line 2
... some background

If no disagreement, I think this topic is coming to an end. :slight_smile:

I’ve gained some deeper understanding (I hope so):

Taking this as an example

julia> function main()
           for j = RANGE Threads.@spawn work( $(f(g(j))) ) end
       end;

julia> @code_lowered main()
...
2 ┄ 
│   %10 = Main.f
│   %11 = Main.g
│   %12 = j
│   %13 = (%11)(%12)
│   %14 = (%10)(%13) # [COMMENT] THIS IS a _SNAPSHOT_ OF `f(g(j))`
│         230 = %14
│   %16 = Base.Threads.Task
│   %17 = Main.:(var"#3#4") # [COMMENT] THIS IS `work` !
│   %18 = 230
│   %19 = Core.typeof(%18)
│   %20 = Core.apply_type(%17, %19)
│   %21 = 230
│         #3 = %new(%20, %21)
│   %23 = #3
│         task = (%16)(%23) # [COMMENT] TASK CREATED
....

I can see that the consequence of using $ is the cut-and-dried static variable %14, which is later again assigned to %21 and %23. Therefore, the snapshot value of f(g(j)) is already decided when the respective task is created.

This is not the case for work, which is assigned to %17 but is later unused, except in that type inference Core.apply_type. This suggest that work is volatile—if work mutates later, the executing of that respective task may use the mutated work.