"Textbook" use of pmap but strange execution times

I have 28 cores and 56 threads and I call julia with “-p auto”.
When starting from a (non-parallel) example,

@time for x in [10;ones(Int, 10)]
sleep(x/10); println(x)

its execution time (~2s) makes perfect sense.

However, the execution time of the following (equivalent?) parallelized code does not:

@time res = pmap(x -> (sleep(x/10); println(x)), [10;ones(Int, 10)]);

In fact, this (reproducibly) takes 7 repeated (identical) executions, before converging to ~1 (i.e. 3.2s, 1.8s, 1.8s, 1.8s, 1.8s, 1.8, 1.1, 1.1,…).

I know Julia compiles functions upon first use (i.e. the 3.2s), but… why should I wait seven repeated executions, before getting the intended parallelization?