In my understanding, these allocations are coming from internal Julia implementation of the scheduler. So yes, multithreading adds some overhead and it is inevitable. But at the same time, parallelization itself reduces overall time significantly, so it’s still a win in the end.
From my experience (if we are not talking about super optimized things like LoopVectorizations.jl) there is a lower boundary, where this overhead makes multithreading slower than single-threaded applications. But if we are talking about milliseconds or seconds as in this example, you can safely ignore these allocations.