I am trying to parallelize a simple function (doTheTransit in the MWE below). I have opened Julia in the terminal with -p 6 since I have a 6-Core processor.
Explanation of my MWE:
With the function createNecessary I am constructing a vector of tuples, each tuple consists of a dictionary and a vector of indexes. In the function doTheTransit I have an input parameter state and a tuple. Generally I want to compute a successor state to the input state. Here every element has a specific function, therefore a specific successor element, and the ith’s successor element is determined by certain elements in the existing state, these certain elements are idx in the function createNecessary. The dictionary created in the function createNecessary are all the possible combinations of idx (keys) and a successor element to the ith element (value).
I now want to use map to iterate over the vector of tuples and create the successor state by computing each successor element separately, with doTheTransit.
This task seems perfect for parallelizing it, but using pmap I have a lot more allocations and need a lot more time than using non-parallel map. I have also included Folds.map and ThreadsX.map since these are also suppose to parallelize the given task, both need a lot less time compared to pmap but are no faster than map.
I am very surprised by this outcome, since I would have liked to reduce the runtime of my functions by parallelizing this computation.
Am I doing something wrong?
Or why does the runtime increase?
using Distributed, Folds, ThreadsX
@everywhere function createNecessary(n)
vectorOfTuple = Vector{Tuple{Dict{Vector{Int64}, Int64}, Vector{Int}}}(undef, n)
randi = rand(1:(n-1), n)
for i in 1:n
dicti = Dict{Vector{Int64}, Int64}()
idx = rand(1:n, randi[i])
for j in 0:((1<<randi[i])-1)
dicti[digits(j, base=2, pad= randi[i])] = rand(0:1)
end
vectorOfTuple[i] = (dicti,idx)
end
return vectorOfTuple
end
@everywhere function doAllTransitions(funcis, state)
println("Map:")
@time map(fun -> doTheTransit(fun, state), funcis)
println("Pmap:")
@time pmap(fun -> doTheTransit(fun, state), funcis)
println("Folds.map:")
@time Folds.map(fun -> doTheTransit(fun, state), funcis)
println("ThreadsX.map:")
@time ThreadsX.map(fun -> doTheTransit(fun, state), funcis)
end
@everywhere function doTheTransit(func, state)
tempi = state[func[2]]
return func[1][tempi]
end
doAllTransitions(createNecessary(25), rand(0:1, 25))
The @time results were the following:
Map:
0.000758 seconds (26 allocations: 4.031 KiB)
Pmap:
810.314207 seconds (46.93 M allocations: 11.876 GiB, 27.62% gc time, 0.01% compilation time)
Folds.map:
0.076025 seconds (5.96 k allocations: 338.998 KiB, 81.34% compilation time)
ThreadsX.map:
0.090041 seconds (7.45 k allocations: 413.728 KiB, 97.14% compilation time)