I have an array
a defined as:
julia> a = collect(1:10000)
I can map a squaring function over the array and time the execution as following:
julia> @time map((x)->x^2, a)
On my machine, the line above runs in 0.05 seconds.
I then try to see the performance gain by running processes in parallel:
julia> using Distributed julia> addprocs(4) julia> @time pmap((x)->x^2, a)
and I find that
pmap runs in 0.5 seconds, which makes it 10 times slower than
map. The documentation says that
pmapis designed for the case where each function call does a large amount of work. In contrast,
@distributed forcan handle situations where each iteration is tiny, perhaps merely summing two numbers.
I assume that this is the reason for the slow down since my function call is only squaring a number. However, what is the deeper explanation for this? Why does
pmap face difficulty in situations where the function call does little work? And more importantly: how to efficiently parallelize the procedure of mapping a small function onto a large array (as in the example above)?