I see. The CachingPool contains the data in the closure. It is still unclear to me how it decides which data to send. Is it all the data available to the function?
For my use case I need each process to use a different chunk of the data. I found a way to do this below by closing over a vector of arrays and passing an index to each process. However, I’m pretty sure this still sends the entire dataset to each node, when only a chunk is necessary. This should be fine because it’s only sent once, but it might be a problem if the data was really huge.
The example below looks more like a maximum likelihood problem because I loop over multiple thetas, keeping the same data. The CachingPool
seems to work very well. The parallel speedup is close to the maximum for 4 cores.
julia> addprocs(4)
4-element Array{Int64,1}:
2
3
4
5
julia> function run_serial(big_data)
f(theta, n) = sum(x^theta for x in big_data[n])
x = zeros(4)
b = ones(4)
for theta in 1:.01:2
x = map(f, theta*b, 1:4)
end
sum(x)
end
run_serial (generic function with 1 method)
julia> function run_parallel(big_data)
f(theta, n) = sum(x^theta for x in big_data[n])
x = zeros(4)
b = ones(4)
for theta in 1:.01:2
x = pmap(f, theta*b, 1:4)
end
sum(x)
end
run_parallel (generic function with 1 method)
julia> function run_cached(big_data)
f(theta, n) = sum(x^theta for x in big_data[n])
wp = CachingPool(workers())
x = zeros(4)
b = ones(4)
for theta in 1:.01:2
x = pmap(wp, f, theta*b, 1:4)
end
sum(x)
end
run_cached (generic function with 1 method)
julia> big_data = [rand(100000), rand(100000), rand(100000), rand(100000)]
4-element Array{Array{Float64,1},1}:
[0.634355, 0.528308, 0.201827, 0.769327, 0.962185, 0.188918, 0.096454, 0.00439114, 0.262142, 0.338069 … 0.568727, 0.114778, 0.913829, 0.428653, 0.288401, 0.465263, 0.73819, 0.62727, 0.934231, 0.586302]
[0.584522, 0.870068, 0.23399, 0.789817, 0.464353, 0.728496, 0.419617, 0.770256, 0.747527, 0.935073 … 0.0398883, 0.647269, 0.969051, 0.0330922, 0.639597, 0.430352, 0.687566, 0.100313, 0.779581, 0.104456]
[0.923503, 0.437401, 0.127655, 0.30547, 0.298625, 0.266635, 0.856103, 0.0752927, 0.552258, 0.467141 … 0.0554492, 0.635918, 0.394429, 0.858632, 0.624893, 0.372602, 0.433335, 0.586451, 0.340953, 0.836536]
[0.311625, 0.696548, 0.88832, 0.472612, 0.567086, 0.553389, 0.682172, 0.618924, 0.0917297, 0.279764 … 0.595471, 0.42165, 0.0448332, 0.475926, 0.622473, 0.0604174, 0.442415, 0.32164, 0.407013, 0.277669]
julia> run_serial(big_data)
133398.25067928832
julia> run_parallel(big_data)
133398.25067928832
julia> run_cached(big_data)
133398.25067928832
julia> @time run_serial(big_data)
3.005636 seconds (2.41 k allocations: 72.623 KiB)
133398.25067928832
julia> @time run_parallel(big_data)
1.329272 seconds (100.20 k allocations: 9.717 MiB, 0.40% gc time)
133398.25067928832
julia> @time run_cached(big_data)
0.861480 seconds (55.51 k allocations: 6.668 MiB)
133398.25067928832