What is the recommended way to run N workers asynchronously?

The performance of this code (the task farm version) is somewhat baffling. I started a Julia REPL on the command line, and pasted that code there. It has just one thread available to it, so it’s asynchronous.

I ran @btime calc(10, 10) and got 610.483 ms (2990 allocations: 180.42 KiB).

If I then run @time calc(10, 10), the performance is either about 13.6 seconds or 0.3 seconds.

An example log, a bit long, sorry.

julia> @time calc(10, 10)
worker-1
worker-2
worker-3
worker-4
worker-5
worker-6
worker-7
worker-8
worker-9
worker-10
 13.640024 seconds (4.46 k allocations: 1.604 MiB)
10-element Vector{Int64}:
 26171
 18846
  7835
  2331
 16983
  8173
 12383
 48503
 53424
 51733

julia> @time calc(10, 10)
worker-1
worker-2
worker-3
worker-4
worker-5
worker-6
worker-7
worker-8
worker-9
worker-10
  0.316631 seconds (2.97 k allocations: 307.250 KiB)
10-element Vector{Int64}:
 15911
 17659
 43273
  5561
 48779
  7519
 16323
 36958
  6299
 38578

julia> @time calc(10, 10)
worker-1
worker-2
worker-3
worker-4
worker-5
worker-6
worker-7
worker-8
worker-9
worker-10
 13.688530 seconds (4.45 k allocations: 1.604 MiB)
10-element Vector{Int64}:
 40780
 38798
 27300
 24115
 20972
 39100
 35428
 10316
  1090
  9087

julia>