I’m trying to work out a good template for the parallel part of my experiments. I’m looking for rock-solid example code to copy and and extend based on the current ecosystem.
Usually, I’m in the embarrassingly || space, which makes it all the more embarrassing that Im not sure about the best practice. I’m faced with a wide array of options. Threads, Distributed, ThreadX, Dagger, FLoops, Transducers etc.
On a single machine (but might go on a cluster later) I want to save things in a dictionary.
Dict(n => expensive_calculation(n) for n in nlist)
Right now I’ve hit on the following pattern which I show for constructive criticism.
using Distributed
using BangBang
using MicroCollections
addprocs(4)
@everywhere function expensive_calculation(n)
sleep(3)
return n => n^2
end
function parallel_dict_computation(n)
result_dict = EmptyDict() # from MicroCollections
tasks = @sync [ @spawn expensive_calculation(n) for n in 1:n]
for task in tasks
result_dict = push!!(result_dict, fetch(task)) # BangBang's push!! promotes to the concrete type.
end
return result_dict
end
result_dictionary = parallel_dict_computation(10)
println(result_dictionary)
Here are my questions
1.) How would YOU write this code? Is there any benefit to adding in Transducers or something like that?
2.) Is it elegant to put all the tasks in an array and fetch them later? Is it better to build the Dict directly inside @sync with a lock somehow?
3.) is the @sync neccesary? doesn’t seem to do anything bad to omit it here. How do I know if a spawn is in a sync’s scope
4.) I keep accidentally rerunning addprocs
. Is there a way to manage the number of proc
’s automatically? Is there a risk of having too many proc’s?
5.) How do threads and procs interact? Does it matter how many threads I start julia with?
6.) And while I’m here: more sophisticated reductions: if I have an associative operator, expensive_op(x,y) what is the best way to parallelize the reduction piece as well? Like a parallel mapreduce(expensive_fun, expensive_op, list).
In the typical case, I can easily assume the scheduler overhead is minimal compared to all expensive calls, but the distribution in the execution time of expensive calls is power law.
Thanks very very much! (Also is anything happening with JuliaFolds2?)