How can I improve this parallel routine on mutually independent tasks?

I intend to solve N mutually independent tasks in parallel and each task is solved on a single CPU. I find an example code online and revised it as in the following MWE.

I was wondering:

  1. Why there is a for loop over all CPUS as in the line for p = 1:np? Will this decrease the speed?
  2. Given the type of parallel tasks I have, what would be the fastest way to do it?

I appreciate any comments.

Here is the MWE:

using Distributed
addprocs(Sys.CPU_THREADS-1)

# A simple function
@everywhere function fun_sleep()
    sleep(5)
    println("I'm worker number $(myid()), and I reside on machine $(gethostname()).")
    return myid()
end

# Parallel computing
function fun_parallel(fun_sleep::Function, num_task::Int64)
    temp = Vector(undef, num_task)
    np = nprocs()
    i = 1
    nextidx() = (idx=i; i+=1; idx)
    @sync begin
        for p = 1:np # what is this for loop doing in here?
            if p != myid() || np == 1
                @async begin
                    while true
                        idx = nextidx()
                        if idx > num_task
                            break
                        end
                        temp[idx] = remotecall_fetch(fun_sleep, p)
                    end
                end
            end
        end
    end
    return temp
end

num_task = 10
@time result = fun_parallel(fun_sleep, num_task)

An “easy” way to parallelize a number of independent tasks (computation(i) below) is to use multithreading, i.e. to start Julia with julia -t N, where N is the number of available physical cores, and then write

Threads.@threads for i in 1:Ntasks
    computation(i)
end

If the execution time of a task (computation(i)) varies strongly between tasks, you may want to go with

@sync for i in 1:Ntasks
    Threads.@spawn computation(i)
end

instead.

(To ensure that different threads, which process the tasks, run on separate CPU-cores, you can set the environment variable JULIA_EXCLUSIVE=1 or use a package like ThreadPinning.jl. Disclaimer: I’m the author of this package :slight_smile:)

You may also want to consider using FLoops.jl which allows you to switch between multithreading (above) and multiprocessing (OP) by simply choosing a different “Executor”.

Well, what type of parallel tasks do you have?

2 Likes