I am using this week to learn the parallel programming features of the language in more depth. I confess that I am a bit confused with the terminology used throughout the documentation, and would like to ask a few questions.
In terms of execution pools, I understand that we have two types in general: processes and threads. The Julia documentation refers to processes as either
workers()(i.e. all processes minus the master process in a
:master_workertopology), and refers to real threads as
1:Threads.nthreads(), correct? I saw that third concept in the documentation called
Coroutine, or even “green threads”. From what I grasped, “green threads” aren’t real threads, what are they then? Should new users learn about them at all?
Can you please explain the difference between
Future? First, I understand that these three concepts are restricted to process pools, so nothing to do with threads. Second, I understand that a
Channelis a buffer where you can place and take data, and that a
RemoteChannelis the same concept with the only difference that it also works across processes. If that is the case, I wonder why the documentation is emphasizing this difference. Couldn’t it just mention
Channelin general? To me it feels like it is explaining the general concept of
Arrayvia internal types like
OffsetArray. Third, I understand that a
Futureis just a channel that is returned by a remote function call? Please correct me if I am wrong.
I started reading the docs of
pmapfrom the Distributed section (i.e. section that deals with process pools) and noticed that it has an intriguing option
distributed=falsethat makes it possible to send work to multiple “tasks” instead of processes. In this context, it seems that “tasks” are a specific thing: Julia coroutines. Can we submit work to remote processes running on remote machines, and use multi-threading there with a specific number of real
Threadsthreads instead of “green threads” as well? Also, this functionality needs to necessarily live in
pmap? How to deal with this hybrid type of parallelism more explicitly?
I understood that all functions with
remote*prefix refer to process pools. We have the main function called
remotecallthat allows calls of functions on remote processes as expected, and variants like
remotecall_waitthat do something extra like fetching the result directly or blocking the execution until the remote process finishes execution. I couldn’t get however the
remote_dodocumentation. What is it? I’ve also understood that all these
remote*functions come in two flavors:
remote*(f, pid, ...)and
remote*(f, pool, ...). The first flavor assigns the work to a specific remote process, and the second flavor waits for any process in the pool to become available before calling the first flavor.
I would like to help improve the documentation after I have a more solid understanding of the concepts. If you can please reply to specific points, that would help a lot.