Parallel is very slow

That doesn’t seem to match what the documentation says. See https://docs.julialang.org/en/stable/manual/parallel-computing/#Parallel-Map-and-Loops-1, last paragraph:

@parallel for can handle situations where each iteration is tiny, perhaps merely summing two numbers.

When I was trying out Julia’s parallel computing capabilities a few weeks ago, I was also surprised by a lack of performance, both with @parallel and with @threads. For @threads, it turned out to be completely due to the closure performance issue (see Parallelizing for loop in the computation of a gradient - #7 by tkoolen). I didn’t look into the details of @parallel, but does it also generate a closure?