Dear All, I have a question about the significant difference in memory allocation between the multi-threads and single threads. I was experimenting Walk on Sphere method for Poisson equation, which is an embarrassingly parallel computation. I get memory allocation amounts like:
Serial execution
64.305986 seconds (1.40 G allocations: 145.770 GiB, 4.98% gc time)
0.2248277002962166
Parallel execution
40.972257 seconds (253.77 k allocations: 13.657 MiB, 0.01% gc time)
0.2248370898353182
Why there is so much difference? and the code is below:
> numThreads=2
> addprocs(numThreads)
> @everywhere function Wos(N::Int,x0::Array{Float64,1})
> d=length(x0)
> tol=1/sqrt(N)
> w=0.0
> for j=1:N
> r=Inf
> x=x0
> while(r>tol)
> r=minimum(1-abs.(x))
> z=randn(d)
> x=x+r*z/norm(z)
> end
> #Projection to Γ
> imin=indmin(1-abs.(x))
> x[imin]=sign(x[imin])
> w=w+(x'*x)/2/d
> end
> return w/N
> end
>
>
> function WosParallel(N::Int, x0::Array{Float64,1}, numThreads::Int)
> chunksize=Int(N/numThreads)
> w=@parallel (+) for i=1:numThreads
> if(i==numThreads)
> chunksize=N-(i-1)*chunksize
> end
> Wos(chunksize,x0)
> end
> return w/numThreads
> end
>
> println("Serial execution")
> @time w= Wos(10000000,[0.0,0.0,0.0])
> println("$w")
>
> println("Parallel execution")
> @time w=WosParallel(10000000,[0.0,0.0,0.0],numThreads)
> println("$w")