There are several things with this setup.
When using @spawn
you should wait until the tasks are completed. I.e. a wait
or fetch
must be performed before you measure the time. Otherwise you’re only measuring the time it takes to @spawn
.
The other thing is more subtle. When running interactively in this fashion it may happen that the spawned task is running on the same thread as the REPL. Thus, you will not get back the REPL prompt until the spawned task is complete, effectively running the tasks serially. This is the “congestion” you observe.
You can avoid this by starting julia with an “interactive” thread. I.e. start with julia -t 4,1
(JULIA_NUM_THREADS=4,1
).
t0 = time();
task1 = @spawn mysum(32);
task2 = @spawn mysum(32);
task3 = @spawn mysum(32);
wait.((task1, task2, task3))
@info "t_elapsed is $(time() - t0)"
# [Step 2] copy-paste the entire following block to REPL and observe the t_elapsed
t0 = time();
task1 = @spawn mysum(32);
task2 = @spawn mysum(32);
task3 = @spawn mysum(32);
task4 = @spawn mysum(32);
wait.((task1, task2, task3, task4))
@info "t_elapsed is $(time() - t0)"
This waiting can also be avoided by enclosing the block with the @spawn
s in a @sync
block:
N = 3;
t0 = time();
@sync for i = 1:N
@spawn mysum(32)
end;
@info "t_elapsed for N = $N is $(time() - t0)"
For your other experiment:
function CPU_intensive_work(N)
s = collect(1:99999)
for i = 1:(10000N)
s = circshift(s, -3)
end
s
end
This is not CPU-intensive. Rather, it’s heavy in allocations:
julia> @time CPU_intensive_work(10);
10.083843 seconds (300.00 k allocations: 74.513 GiB, 14.06% gc time)
The reason is that circshift(s, -3)
creates a new vector. Allocation in itself does not take much time, but the garbage collection does. It’s devastating for parallel tasks. Most of the allocations can be avoided by doing it in-place:
function CPU_intensive_work(N)
s = collect(1:99999)
for i = 1:(10000N)
circshift!(s, -3)
end
end
julia> @time CPU_intensive_work(10);
5.150725 seconds (3 allocations: 781.320 KiB)
In general it’s better to use @time
than doing the timing yourself, you then get information on allocations, and time spent in gc, compilation and locks. Better yet is to use @btime
or @benchmark
from the package BenchmarkTools.jl, or @b
or @be
from package Chairmarks.jl. These macros will run the benchmarks several times and report the average.
Overall, it’s in general a good idea to enclose things in functions. There are two reasons for this. Julia’s unit of compilation is the function. That is, functions are compiled to machine code. REPL commands may not be, but are run in an interpreter.
The other reason is that variables you create in the REPL are global. When non-const
global variables are used, the compiler can’t make any assumption about their type or value, resulting in slow code.