Help with parallel computing for a simple loop with a large function

Hi, I am relatively new to and wish to implement parallel computing in my code.

Currently, I have a code with a large function f(x) within a loop.
For example,

for i = 1:N 
     a[ i ] = f(x[ i ])
end
...
...
C = C + a

N is usually small, usually N<6.
Note that I need to retrieve “a” at one point.
How do I implement parallel computing for the for loop and ensure that I will get the right “a”?

Since you have indicated your function is large (==expensive?), you could look at pmap:

a = pmap(f, x) 

pmap is pretty simple & safe, and a good choice for relatively expensive functions.

Here’s a mini example:

using Distributed
addprocs()
@everywhere f(x) = x*x
x = [2,3,6,8,11]
a = pmap(f, x)

Another option would be @distributed for loop using SharedArrays.

See Parallel Computing — Julia Language 0.3.13-pre documentation

Also, Julia v1.3 introduces multi-threading to the mainstream,
https://docs.julialang.org/en/v1.4-dev/manual/parallel-computing/#man-multithreading-1

3 Likes

You can also do multi threading in Julia 1.2

Take an one dimensional array of rawdata, split it into three pieces and send it to 3 threads to double the value of each element of the data.

using Distributed
# This code is designed for Julia 1.2
struct ThreadStruct
    t::Int64
    data::Vector{Float64}
end

@show rawdata = rand(10)
NumOfThreads = 3
chunksize = Int64(floor(length(rawdata) / NumOfThreads))
ArrayOfThreadStruct = ThreadStruct[]
for t = 1:NumOfThreads
    if t < NumOfThreads
        push!( ArrayOfThreadStruct, ThreadStruct(t, rawdata[chunksize*(t-1)+1:chunksize*(t)+1-1]) )
    else
        push!( ArrayOfThreadStruct, ThreadStruct(t, rawdata[chunksize*(t-1)+1:length(rawdata)]) )
    end
end
@show ArrayOfThreadStruct

function WorkTask(t)
    numofelements = length(ArrayOfThreadStruct[t].data)
    result = Array{Float64}(undef,numofelements)
    for c = 1:numofelements
        println("Thread $(t), Working with data = $(ArrayOfThreadStruct[t].data[c])")
        result[c] = 2.0 * ArrayOfThreadStruct[t].data[c]
        sleep((NumOfThreads - t + 1) * rand())
    end
    return result
end

Workers = []
for t = 1:NumOfThreads
    push!(Workers, @spawn WorkTask(t))
end

finalresult = Float64[]
for t = 1:length(Workers)
    global finalresult
    indresult = fetch(Workers[t])
    finalresult = vcat(finalresult,indresult)
end
println("Done")
println("Final result")
print(finalresult)

Here is the output

Starting Julia...
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _  |  |
  | | |_| | | | (_| |  |  Version 1.2.0 (2019-08-20)
 _/ |\__ _|_|_|\__ _|  |  Official https://julialang.org/ release
|__/                   |

rawdata = rand(10) = [0.90670278391635, 0.9790199853002819, 0.8945243886382361, 0.6060277141258699, 0.5961610143021103, 0.025213599592954106, 0.6741559615457484, 0.04591064786597654, 0.9994519128410557, 0.7984804721742531]
ArrayOfThreadStruct = ThreadStruct[ThreadStruct(1, [0.90670278391635, 0.9790199853002819, 0.8945243886382361]), ThreadStruct(2, [0.6060277141258699, 0.5961610143021103, 0.025213599592954106]), ThreadStruct(3, [0.6741559615457484, 0.04591064786597654, 0.9994519128410557, 0.7984804721742531])]
Thread 1, Working with data = 0.90670278391635
Thread 2, Working with data = 0.6060277141258699
Thread 3, Working with data = 0.6741559615457484
Thread 3, Working with data = 0.04591064786597654
Thread 2, Working with data = 0.5961610143021103
Thread 2, Working with data = 0.025213599592954106
Thread 1, Working with data = 0.9790199853002819
Thread 1, Working with data = 0.8945243886382361
Thread 3, Working with data = 0.9994519128410557
Thread 3, Working with data = 0.7984804721742531
Done
Final result
[1.8134055678327, 1.9580399706005638, 1.7890487772764723, 1.2120554282517397, 1.1923220286042207, 0.05042719918590821, 1.3483119230914968, 0.09182129573195308, 1.9989038256821114, 1.5969609443485062]
julia> 
1 Like

Thanks for the replies! They were helpful.

Hi @drez can you comment more about what are the requirements of your functions and what are the computing resources you have?

For example, how much data does each run of f generate? How big is your computer in terms of ram and number of cores?