I am using this week to learn the parallel programming features of the language in more depth. I confess that I am a bit confused with the terminology used throughout the documentation, and would like to ask a few questions.
Questions
-
In terms of execution pools, I understand that we have two types in general: processes and threads. The Julia documentation refers to processes as either
procs()orworkers()(i.e. all processes minus the master process in a:master_workertopology), and refers to real threads as1:Threads.nthreads(), correct? I saw that third concept in the documentation calledTaskorCoroutine, or even “green threads”. From what I grasped, “green threads” aren’t real threads, what are they then? Should new users learn about them at all? -
Can you please explain the difference between
Channel,RemoteChannelandFuture? First, I understand that these three concepts are restricted to process pools, so nothing to do with threads. Second, I understand that aChannelis a buffer where you can place and take data, and that aRemoteChannelis the same concept with the only difference that it also works across processes. If that is the case, I wonder why the documentation is emphasizing this difference. Couldn’t it just mentionChannelin general? To me it feels like it is explaining the general concept ofArrayvia internal types likeSubArray,OffsetArray. Third, I understand that aFutureis just a channel that is returned by a remote function call? Please correct me if I am wrong. -
I started reading the docs of
pmapfrom the Distributed section (i.e. section that deals with process pools) and noticed that it has an intriguing optiondistributed=falsethat makes it possible to send work to multiple “tasks” instead of processes. In this context, it seems that “tasks” are a specific thing: Julia coroutines. Can we submit work to remote processes running on remote machines, and use multi-threading there with a specific number of realThreadsthreads instead of “green threads” as well? Also, this functionality needs to necessarily live inpmap? How to deal with this hybrid type of parallelism more explicitly? -
I understood that all functions with
remote*prefix refer to process pools. We have the main function calledremotecallthat allows calls of functions on remote processes as expected, and variants likeremotecall_fetchandremotecall_waitthat do something extra like fetching the result directly or blocking the execution until the remote process finishes execution. I couldn’t get however theremote_dodocumentation. What is it? I’ve also understood that all theseremote*functions come in two flavors:remote*(f, pid, ...)andremote*(f, pool, ...). The first flavor assigns the work to a specific remote process, and the second flavor waits for any process in the pool to become available before calling the first flavor.
I would like to help improve the documentation after I have a more solid understanding of the concepts. If you can please reply to specific points, that would help a lot.