Threads, @spawn on Mac - inconsistent behavior

I’ve a school project to conduct FFTs with Julia. I need to show that running a 1D FFT on an image is faster when running in parallel on the processor’s cores then when running a single instance. I think the reality is, cores/threads…I just need to show parallel work being done and that it is faster.

I’ve read many, many things and tried several samples from websites and this is what I’ve run into:

When I run my FFT for a single instance, I can see on my Activity Monitor, multiple threads running on multiple cores (3 of 6).
When I run my FFT for parallel, I can see on my Activity Monitor, multiple threads running on multiple cores (6 of 6), but not evenly loaded (this is okay).
BUT it takes the same amount of time either way. This is not okay.

From this forum, I found: How to Maximize CPU Utilization - @spawn Assigning to Busy Workers - Use pmap Instead - #5 by pbayer
And when I run all the code by ‘pbayer’, initially I get similar times, but the last few, the returned times posted at 13 and 14 seconds and my machine is taking 98 and 90 seconds.

I’m at a total loss why, no matter what I do, running with or without parallel processes, most things take the same time, but in some cases, running parallel takes longer.
There is one instance with some at_distributed that appears quicker, but I’ve not been able to replicate it.

If
Julia> using LinearAlgebra
Julia> const a = rand(10000,10000)
Julia> svdvals(a) #or svd(a)
Activity Monitor shows all the cores get maxed out even though I’ve not told it to do anything in parallel.

I’m on a 2018 MacBook Pro with MacOSX Catalina. I’m using Atom as my editor and then executing code from the Mac terminal.

I’m not asking for anyone to do my school work. I am asking for resources that I can use to understand what is going on with getting the same times or slower times when running in parallel.

Here is the code (incomplete for school):
#=


=#

using FFTW
using LinearAlgebra
using Images, ImageView, ImageInTerminal
using FileIO #QuartzImageIO is supposedly better for MacOSX

#WARNING: using Distributed.@spawn in module Main conflicts with an existing identifier.
using Distributed
import Base.Threads.@spawn

function JoeD(A)
N=size(A)[2]
JoeHat = copy(A)
for k=1:N
JoeHat = fft(A[:,k])
end
#return JoeHat
end

function Joe(A,J)
N=size(A)[2]
JoeHat = copy(A)
for k=J:6:N
JoeHat = fft(A[:,k])
end
#return JoeHat
end

A = load(“nasa.jpg”)
#println(size(A))
MyRGB=convert(Array{Float64},channelview(A))
colorview(RGB,MyRGB)
#println(size(MyRGB))
R=MyRGB[1,:,:]
G=MyRGB[2,:,:]
B=MyRGB[3,:,:]

@time begin
JoeD®
JoeD(G)
JoeD(B)
end

@time begin
core1 = Threads.@spawn Joe(R,1)
core2 = Threads.@spawn Joe(R,2)
core3 = Threads.@spawn Joe(R,3)
core4 = Threads.@spawn Joe(R,4)
core5 = Threads.@spawn Joe(R,5)
core6 = Threads.@spawn Joe(R,6)
core1 = Threads.@spawn Joe(G,1)
core2 = Threads.@spawn Joe(G,2)
core3 = Threads.@spawn Joe(G,3)
core4 = Threads.@spawn Joe(G,4)
core5 = Threads.@spawn Joe(G,5)
core6 = Threads.@spawn Joe(G,6)
core1 = Threads.@spawn Joe(B,1)
core2 = Threads.@spawn Joe(B,2)
core3 = Threads.@spawn Joe(B,3)
core4 = Threads.@spawn Joe(B,4)
core5 = Threads.@spawn Joe(B,5)
core6 = Threads.@spawn Joe(B,6)
wait(core1)
wait(core2)
wait(core3)
wait(core4)
wait(core5)
wait(core6)
end

Welcome! Your post is difficult to read. I suggest, you edit it following …

That may be due to load imbalance. If you Threads.@spawn several tasks doing IO operations, the load can become very imbalanced (as in that example). This happens because the tasks yield immediately after start and the Julia scheduler then starts the next task on the same thread.

Without having investigated further I guess this may be also the case in your problem since your functions/tasks get an MyRGB[....] argument.