Parallelizing for loop in the computation of a gradient

Regarding Threads.@threads, I was playing around with that myself this weekend and was also surprised by slowdowns and huge increases in allocations compared to the non-threaded version. In my case it turned out to be because of the old nemesis, performance of captured variables in closures · Issue #15276 · JuliaLang/julia · GitHub, as @threads creates a closure internally.

You should compare the @code_warntype for the non-parallelized version to what you get with Threads.@threads. If you’re seeing that variable types aren’t properly inferred anymore, the issue is likely to be what I described above. One of the standard workarounds, which worked in my case, is to use a let block, as described in https://github.com/JuliaLang/julia/issues/15276#issuecomment-318598339. On 0.6.2 however, part of the issue for me was that one of the variables created inside @threads (range) is also used in a closure. This has been fixed in master: https://github.com/JuliaLang/julia/pull/24688.

2 Likes