I really don’t know what I’m asking here, but:
When writing code I frequently find that the points where different sections of code actually depend on each other are sparsely scattered across the code. This does not resemble the case where each and every line linearly depends on the line before it. What is the accepted way to optimize that? Should we be @async
and @sync
ing parts of code to spawn away the (temporarily and) temporally independent sections, collecting them only when they’re needed?
I’m obviously confused about concurrency and parallel computing, but it seems like this is both a really common issue and one that the solution for should lead to a dramatic improvement in performance.
Thanks for any enlightenment!
Parallelism vs. Concurrency
Concurrency (where ***
means that the task is running at this moment):
task 1: **** *********
task 2: **** ***
Task 1 and task 2 do not strictly follow each other, but instead their executions interleave. Concurrent tasks may use different CPU cores or share the same one.
Parallelism:
task 1: *************
task 2: *******
Both tasks execute at the same time. To have tasks running really in parallel, you have to provide them with separate CPU cores.
@sync
, @async
and Task
In Julia, @sync
and @async
are convenient interfaces to Julia feature called tasks, on low level represented by corresponding type Task
. Tasks, also known as coroutines, are primary tools for concurrency in Julia. Essentially they are functions that allow multiple entrances and exits. E.g. with normal functions f()
and g()
call stack may look like:
f: **** ****
g: ****
I.e. f()
calls g()
, and util g()
is finished, f()
is suspended. However, if f()
and g()
are tasks, the diagram may look like:
f: **** ** **
g: **** ***
And at the end both functions/tasks may still be running, keeping their state alive and ready for further processing.
Depending on your use case, you may be happy with high-level @async
, @sync
, as well as produce
and consume
, or create tasks manually, coordinating them using Condition
s and Channel
s or even low-level yieldto()
function.
Note that none of these include parallel processes which is a separate topic.
Thanks for the informative answer!
The DataFlow.jl package seems like it could be utilized to automatically decide which sections of the code could be parallelized…
Thanks for starting interesting Discussion Regarding Concurrency and Parallelism. I am from Python Background and most of the time I get lots of orders from my managers to make my solution Parallel which is Mostly I/O bound and I make those solutions concurrent and everyone is happy at the end.
Although A simple question arises in my head is there any @decorator in Julia or possibility of having one where JuliaLang could guide us either function running under particular-decorator(say @check_bound) could also provide Details if given function is CPU_Bound or I/O bound?