Also, if you see performance issues with @threads
(for example, if removing it actually makes your code faster), then you’re probably running into performance of captured variables in closures · Issue #15276 · JuliaLang/julia · GitHub . There’s some discussion of the issue here: Parallelizing for loop in the computation of a gradient - #7 by tkoolen as well.