Parallel is very slow

Yeah, exactly.

You really shouldn’t have to manually split up the range with @threads though, the macro does that for you. Sure, there is this argument for splitting up manually: Parallelizing for loop in the computation of a gradient - #18 by saschatimme, but in this particular case I don’t think there’s any inference issue, at least on 0.7.