I have 28 cores and 56 threads and I call julia with “-p auto”.
When starting from a (non-parallel) example,
@time for x in [10;ones(Int, 10)]
its execution time (~2s) makes perfect sense.
However, the execution time of the following (equivalent?) parallelized code does not:
@time res = pmap(x -> (sleep(x/10); println(x)), [10;ones(Int, 10)]);
In fact, this (reproducibly) takes 7 repeated (identical) executions, before converging to ~1 (i.e. 3.2s, 1.8s, 1.8s, 1.8s, 1.8s, 1.8, 1.1, 1.1,…).
I know Julia compiles functions upon first use (i.e. the 3.2s), but… why should I wait seven repeated executions, before getting the intended parallelization?