Trying @threads
and @distributed
with my favourite parallel test case. Can’t figure out why @threads
is so slow and allocates so much memory and why @distributed
is so slow.
function genFractal(α, n=100)
niter = 255 # max number of iterations
threshold = 10.0 # limit of |z|, before label as divergence
len = 3.0 # len^2 is area of picture
xmin, ymin = -1.5, -1.5
ymax = ymin + len
ax = len/n
z::Complex = 0.0
zinit::Complex = α > 0 ? 0.0 : 1.0+im
count = Array{Int,2}(undef,n,n)
@threads for j in 1:n
cy = ymax - ax*j
for i in 1:n
cx = ax*i + xmin
c = cx + im*cy
nk = niter
z = zinit
for k in 1:niter
if abs(z) < threshold
z = z^α + c
else
nk = k-1
break
end
end
@inbounds count[i,j] = nk
end
end
return count
end
frac = genFractal(2,1000);
@time genFractal(2,1000);
running this without the @threads
on my MacBook pro gives:
1.609887 seconds (6 allocations: 7.630 MiB)
Now with @threads
and 2 threads:
6.242670 seconds (125.02 M allocations: 3.098 GiB, 4.51% gc time)
Look at the allocations and memory used! I’ve got much better results by turning that outer loop into recursive calls and using @spawn
, getting 0.815582 seconds for 2 threads.
@distributed
is not much better. Array Count
is now a shared array:
count = SharedArray{Int,2}(n,n)
@sync @distributed for j in 1:n
Starting Julia with -p 1
gives:
33.244954 seconds (1.26 k allocations: 67.547 KiB)
and starting with -p 2
gives:
16.797049 seconds (598 allocations: 24.391 KiB)
So it scales, but is much slower than the sequential run.