@Shazman the repository contains a minimum example that you can download and run. You can modify the example to your needs, but this is off-topic here.
@stephenll There is a possible use case: if you are running computation in parallel where the individual computation times vary significantly.
From the @threads documentation:
The iteration space is split among the threads, after which each thread writes its thread ID to its assigned locations
So basically, when running your parallel computation with Threads.@threads
each computation is assigned to a thread a priori whereas using pmap
will assign a computation to one of the workers a soon when it is available. E.g. If you have one computation that takes an order of magnitude longer than the other ones, using Threads.@threads
will be running the long one and the threadâs additional assigned tasks, whereas with pmap
on worker will work on the long computation and the other workers will deal with the faster ones.
Iâve added an example below. One thing that I have not considered is the possible slowdown due to the additional overhead, as the significance of this depends highly on the application youâre dealing with.
â Warning: running threaded
â @ Main ~/Desktop/parallelcompare.jl:29
#= /Users/bart/Desktop/parallelcompare.jl:30 =# @benchmark(threaded_tasks()) = Trial(55.044 s)
â Warning: running distrubuted
â @ Main ~/Desktop/parallelcompare.jl:31
#= /Users/bart/Desktop/parallelcompare.jl:32 =# @benchmark(distributed_task()) = Trial(18.013 s)
using Distributed
using BenchmarkTools
@everywhere begin
using Logging
Logging.disable_logging(LogLevel(0))
function task(id::Int; duration=1)
@info "starting task $(id) on $(Threads.threadid()) (duration: $(duration))"
sleep(duration)
@info "finished task $(id) on $(Threads.threadid())"
end
end
function threaded_tasks(n=10)
@info "running on $(Threads.nthreads()) threads"
Threads.@threads for id in 1:10
task(id, duration=id)
end
end
function distributed_task(n=10)
@info "running on $(length(Distributed.workers())) workers"
pmap(x->task(x, duration=x), 1:n)
end
function main()
@warn "running threaded"
@show @benchmark threaded_tasks()
@warn "running distrubuted"
@show @benchmark distributed_task()
end
main()
Just chiming in, check out this package: https://github.com/LCSB-BioCore/DistributedData.jl, it is pretty awesome and makes lots of distributed computing tasks very straightforward⌠Potentially a good inclusion in the ultimate guide
Thereâs ThreadPools.jl to handle the varying work case.
PRs are welcome in the repository to add links to these packages. The more centralized are these resources the better for new users.