I have a function that computes the mean of each column of a matrix 100 times. However, the serial version is as fast as the parallel version on my machine (3.5 GHz Intel Core i5 with 4 cores)
addprocs(3) @everywhere function repeated_mean(x) for _ in 1:100 mean(x) end return x end function f(X) @time map(repeated_mean, X) @time pmap(repeated_mean, X) return nothing end f([rand(10_000_000) for j in 1:20]) #> 8.432926 seconds (2 allocations: 256 bytes) #> 9.528860 seconds (377.27 k allocations: 1.524 GiB, 8.65% gc time)
What is happening? I naively expected the parallel version to be 4 times faster. Is there a better way to exploit multicores in this kind of computation?