I’m utterly inexperienced in the usage of tasks in any programming language. I’m having trouble in my present application and came up with the following example to try and explain to myself what is going wrong. What I’m interested in here is whether the narrative I’ve constructed below is actually correct. So, my toy example follows:
function f_test()
tt1 = @async f_task()
cc = 0
while true
mod(cc, 50000000) == 0 && println("Up to cc = $(cc)")
cc += 1
cc == 50000000000 && break
end
println("f_test complete")
end
function f_task()
sleep(1)
sleep(1)
println("Task done")
end
So, f_test()
uses an @async
call on f_task()
. This task should sleep for 2 seconds and then print “task done”. However, in the meantime f_test
enters a very long while
loop, that will take on the order of 10 to 15 seconds to run, and prints output to the REPL sporadically.
The behaviour I observe is that even though f_task()
is supposed to finish in 2 seconds, it instead is not able to complete until the while
loop finishes. My narrative is that this is because it simply never gets a chance to complete because I’m doing everything on a single thread and the while
loop is hogging all the processing time.
My question: Is this narrative correct, or have I completely misunderstood how this stuff works?
Thanks in advance.
Just watched the class about concurrency, he mentioned that it requires a program to reach a yield point which allows the other task to be scheduled.
For example if your function is IO, network, or sleep function, it doesn’t require massive CPU computation, and they have many yield point for the scheduler to switch to other task.
But a fat for loop doesn’t have a yield point for performance, when it’s running, you cannot even control-c to close the program because it doesn’t listen.
For better understanding you should watch the class I mentioned, much better than what I said here…
1 Like
I’ll watch it. Chris’s stuff is great.
And it sounds like my understanding is correct then. The issue is that my loop is too greedy. I experimented with it by popping in a sleep(0.001)
every time the mod
condition is triggered and lo behold my task was now able to complete after 2 seconds.
Thanks for responding. Very helpful.
Do not use sleep(0.001)
, there is a special function for that: yield.
Also, it usually make sense to @sync
all @async
tasks, it helps in error propagation and generally more readable, since it explicitly defines asynchronous blocks.
function f_task()
sleep(2)
println("Task done")
end
function f_test()
@sync begin
@async f_task()
@async begin
cc = 0
while true
mod(cc, 50000000) == 0 && (println("Up to cc = $(cc)"); yield())
cc += 1
cc == 50000000000 && break
end
println("f_test complete")
end
end
end
2 Likes
Thanks for responding. Just to be sure I understand: the call yield()
basically checks if there are any other tasks that want to run, and if there are, it switches to them. Is that correct?
And if f_task()
was a little more involved than the current version, i.e. if it contained its own loop, then it might need to also contain a yield()
every once in a while to let the other loop run? Would this be a “normal” program structure?
So you could end up with two competing tasks that contain a loop, and each would every once in a while need to yield()
to the other loop… would that be a reasonably normal structure?
No, as it is written in documentation
Switch to the scheduler to allow another scheduled task to run. A task that calls this function is still runnable, and will be restarted immediately if there are no other runnable tasks.
So yield()
is not checking anything, it just returns control to the scheduler, which in turn decides which task to run next.
Yes, if f_task()
is more involved, it should also give up control at some point. And this structure with multiple yields is normal, this is how it should be. But again, it doesn’t gives up control to another loop, it returns control to central scheduler which decides which task to run next. So you can have thousands of tasks, they just form a queue and will be resolved one by one.
1 Like
Alternatively you can use @spawn
, in this case all tasks are wrapped in threads and OS decides which task to run next. But this is multithreading with all of it bad sides like race conditions. And you get parallelism only if tasks are being run on different threads, if you have single thread than you again need to use yield()
to return control.
using Base.Threads: @spawn
function f_task()
sleep(2)
println("Task done")
end
function f_test()
@sync begin
@spawn f_task()
@spawn begin
cc = 0
while true
mod(cc, 50000000) == 0 && println("Up to cc = $(cc)")
cc += 1
cc == 50000000000 && break
end
println("f_test complete")
end
end
end
2 Likes
This is very helpful to my understanding, thank you very much. Regarding this:
Is there any easy intuition for how the scheduler decides which task gets to run next, or is this a difficult thing to explain in a sentence or two? (as an aside, I read the other stuff in what you posted and have worked out that I can control it explicitly with yieldto
if I want to).
Unfortunately I do not know. It is very short code on Julia side, which quickly goes into some C
implementation. I presume only core developers can give an answer.
1 Like
No worries, thanks for responding.
EDIT: Also I found this github issue which suggests other people are also thinking about this.
1 Like