Trying @threads and @distributed with my favourite parallel test case. Can’t figure out why @threads is so slow and allocates so much memory and why @distributed is so slow.
function genFractal(α, n=100)
    niter = 255 # max number of iterations
    threshold = 10.0 # limit of |z|, before label as divergence
    len = 3.0  # len^2 is area of picture
    xmin, ymin = -1.5, -1.5
    ymax = ymin + len
    ax = len/n
    z::Complex = 0.0
    zinit::Complex = α > 0 ? 0.0 : 1.0+im
    count = Array{Int,2}(undef,n,n)
    @threads for j in 1:n
        cy = ymax - ax*j
        for i in 1:n
            cx = ax*i + xmin
            c = cx + im*cy
            nk = niter
            z = zinit
            for k in 1:niter
                if abs(z) < threshold
                    z = z^α + c
                else
                    nk = k-1
                    break
                end
            end
            @inbounds count[i,j] = nk
        end
    end
    return count
end
frac = genFractal(2,1000);
@time genFractal(2,1000);
running this without the @threads on my MacBook pro gives:
1.609887 seconds (6 allocations: 7.630 MiB)
Now with @threads and 2 threads:
6.242670 seconds (125.02 M allocations: 3.098 GiB, 4.51% gc time)
Look at the allocations and memory used! I’ve got much better results by turning that outer loop into recursive calls and using @spawn, getting 0.815582 seconds for 2 threads.
@distributed is not much better. Array Count is now a shared array:
count = SharedArray{Int,2}(n,n)
@sync @distributed for j in 1:n
Starting Julia with -p 1 gives:
33.244954 seconds (1.26 k allocations: 67.547 KiB)
and starting with -p 2 gives:
16.797049 seconds (598 allocations: 24.391 KiB)
So it scales, but is much slower than the sequential run.
 Okay that makes sense, thanks.
  Okay that makes sense, thanks. , but the results hadn’t been affected (at least not the fractal, visually). Now the
, but the results hadn’t been affected (at least not the fractal, visually). Now the