I am using this week to learn the parallel programming features of the language in more depth. I confess that I am a bit confused with the terminology used throughout the documentation, and would like to ask a few questions.
Questions
-
In terms of execution pools, I understand that we have two types in general: processes and threads. The Julia documentation refers to processes as either
procs()
orworkers()
(i.e. all processes minus the master process in a:master_worker
topology), and refers to real threads as1:Threads.nthreads()
, correct? I saw that third concept in the documentation calledTask
orCoroutine
, or even “green threads”. From what I grasped, “green threads” aren’t real threads, what are they then? Should new users learn about them at all? -
Can you please explain the difference between
Channel
,RemoteChannel
andFuture
? First, I understand that these three concepts are restricted to process pools, so nothing to do with threads. Second, I understand that aChannel
is a buffer where you can place and take data, and that aRemoteChannel
is the same concept with the only difference that it also works across processes. If that is the case, I wonder why the documentation is emphasizing this difference. Couldn’t it just mentionChannel
in general? To me it feels like it is explaining the general concept ofArray
via internal types likeSubArray
,OffsetArray
. Third, I understand that aFuture
is just a channel that is returned by a remote function call? Please correct me if I am wrong. -
I started reading the docs of
pmap
from the Distributed section (i.e. section that deals with process pools) and noticed that it has an intriguing optiondistributed=false
that makes it possible to send work to multiple “tasks” instead of processes. In this context, it seems that “tasks” are a specific thing: Julia coroutines. Can we submit work to remote processes running on remote machines, and use multi-threading there with a specific number of realThreads
threads instead of “green threads” as well? Also, this functionality needs to necessarily live inpmap
? How to deal with this hybrid type of parallelism more explicitly? -
I understood that all functions with
remote*
prefix refer to process pools. We have the main function calledremotecall
that allows calls of functions on remote processes as expected, and variants likeremotecall_fetch
andremotecall_wait
that do something extra like fetching the result directly or blocking the execution until the remote process finishes execution. I couldn’t get however theremote_do
documentation. What is it? I’ve also understood that all theseremote*
functions come in two flavors:remote*(f, pid, ...)
andremote*(f, pool, ...)
. The first flavor assigns the work to a specific remote process, and the second flavor waits for any process in the pool to become available before calling the first flavor.
I would like to help improve the documentation after I have a more solid understanding of the concepts. If you can please reply to specific points, that would help a lot.