The performance of this code (the task farm version) is somewhat baffling. I started a Julia REPL on the command line, and pasted that code there. It has just one thread available to it, so it’s asynchronous.
I ran @btime calc(10, 10) and got 610.483 ms (2990 allocations: 180.42 KiB).
If I then run @time calc(10, 10), the performance is either about 13.6 seconds or 0.3 seconds.
An example log, a bit long, sorry.
julia> @time calc(10, 10)
worker-1
worker-2
worker-3
worker-4
worker-5
worker-6
worker-7
worker-8
worker-9
worker-10
13.640024 seconds (4.46 k allocations: 1.604 MiB)
10-element Vector{Int64}:
26171
18846
7835
2331
16983
8173
12383
48503
53424
51733
julia> @time calc(10, 10)
worker-1
worker-2
worker-3
worker-4
worker-5
worker-6
worker-7
worker-8
worker-9
worker-10
0.316631 seconds (2.97 k allocations: 307.250 KiB)
10-element Vector{Int64}:
15911
17659
43273
5561
48779
7519
16323
36958
6299
38578
julia> @time calc(10, 10)
worker-1
worker-2
worker-3
worker-4
worker-5
worker-6
worker-7
worker-8
worker-9
worker-10
13.688530 seconds (4.45 k allocations: 1.604 MiB)
10-element Vector{Int64}:
40780
38798
27300
24115
20972
39100
35428
10316
1090
9087
julia>