Memory allocation in multi-thread vs single-thread

Dear All, I have a question about the significant difference in memory allocation between the multi-threads and single threads. I was experimenting Walk on Sphere method for Poisson equation, which is an embarrassingly parallel computation. I get memory allocation amounts like:

Serial execution
64.305986 seconds (1.40 G allocations: 145.770 GiB, 4.98% gc time)
0.2248277002962166
Parallel execution
40.972257 seconds (253.77 k allocations: 13.657 MiB, 0.01% gc time)
0.2248370898353182

Why there is so much difference? and the code is below:

> numThreads=2
> addprocs(numThreads)
> @everywhere function Wos(N::Int,x0::Array{Float64,1})
>     d=length(x0)
>     tol=1/sqrt(N)
>     w=0.0
>     for j=1:N
>         r=Inf
>         x=x0
>         while(r>tol)
>             r=minimum(1-abs.(x))
>             z=randn(d)
>             x=x+r*z/norm(z)
>         end
>         #Projection to Γ
>         imin=indmin(1-abs.(x))
>         x[imin]=sign(x[imin])
>         w=w+(x'*x)/2/d
>     end
>     return w/N
> end
> 
> 
> function WosParallel(N::Int, x0::Array{Float64,1}, numThreads::Int)
>     chunksize=Int(N/numThreads)
>     w=@parallel (+) for i=1:numThreads  
>         if(i==numThreads)
>             chunksize=N-(i-1)*chunksize
>         end
>         Wos(chunksize,x0)
>     end
>     return w/numThreads
> end
> 
> println("Serial execution")
> @time w= Wos(10000000,[0.0,0.0,0.0])
> println("$w")
> 
> println("Parallel execution")
> @time w=WosParallel(10000000,[0.0,0.0,0.0],numThreads)
> println("$w")