I know that according to ( Parallel Computing · The Julia Language ) there are three categories for parallel programming in Julia: Coroutines (Green Threading), Multi-Threading and also Multi-Core/ Distributed Processing.
I would like to test a few things, each with a representative of one of these techniques. What could I use for that?
I guess that Base.@task belongs to the couroutines. Threads.@threads / Threads.@spawn to the Multi-Threading category.
And Distributed.@distributed to the Multi-Core / Distributed group.
Are these assumptions correct so far?
If I want to write one sample program per category, does this selection make sense? (Base.@task, Threads.@threads, Distributed.@distributed) ?
I only have one PC at my disposal, and Distributed runs on different PCs. Is there any representative of the Multi-Core / Distributed category which I could use otherwise?
Does asyncmap belong to this group?
You can start multiple local worker processes on the same machine using the -p command line parameter.
Workers do not share memory, so you will get comparable results to a small cluster on a fast network, as long as you have enough cores. Check the docs here:
asyncmap belongs to the coroutine (aka async) family.
Please note that coroutines are not parallel, only concurrent, so you will get performance benefit from them only when using blocking operations like I/O. But you have to learn them if you want to understand the others.
I havn’t really looked into multithreading yet, but I do make use of Distributed quite a bit. My workflow usually goes like the following. Suppose you have the following computationally expensive function:
function work()
# a function that does a lot of work (i.e. is computationally expensive)
end
I next addprocs(n) # add n workers, this launches n workers, i.e. julia processes with the --worker flag. Then I mainly use the pmap function to run parallel computations, so something like
results = pmap(1:10) do i
w = work(i)
p = process(w) # will run on the worker process!
end
You have to be careful here. If you have n workers all running work(), you have to make sure there is sufficient memory on your system (and remember all the results from work() are returned to the variable results so you must ensure there is memory left for this collection also.
Another caveat is that pmap is only really useful when work() is computationally expensive to offset the overhead. If the work function is simple and fast, it’s better to use multithreading.
You need to run (for every function that needs to run on worker processes)
@everywhere function process()
do stuff
end
This will define process on all workers. Right now, you are only defining process on your main worker (actually from your error message, you arn’t even defining that). Remember I gave you pseudocode.
The second work dosn’t error because the annonymous function (i -> work(i)) is implicitly @everywhere’ed so its defined on all workers.