Parallelizing for loop in the computation of a gradient

tkoolen · February 19, 2018, 12:41pm

Regarding Threads.@threads, I was playing around with that myself this weekend and was also surprised by slowdowns and huge increases in allocations compared to the non-threaded version. In my case it turned out to be because of the old nemesis, performance of captured variables in closures · Issue #15276 · JuliaLang/julia · GitHub, as @threads creates a closure internally.

You should compare the @code_warntype for the non-parallelized version to what you get with Threads.@threads. If you’re seeing that variable types aren’t properly inferred anymore, the issue is likely to be what I described above. One of the standard workarounds, which worked in my case, is to use a let block, as described in https://github.com/JuliaLang/julia/issues/15276#issuecomment-318598339. On 0.6.2 however, part of the issue for me was that one of the variables created inside @threads (range) is also used in a closure. This has been fixed in master: https://github.com/JuliaLang/julia/pull/24688.

Topic		Replies	Views
Slow parallel for loop New to Julia	1	870	April 11, 2017
For loop Performance Performance question	8	1649	January 11, 2020
Parallelization on the CPU isn't effective General Usage	19	541	November 19, 2021
Improving the code speed by employing parallelism for asynchronous task Performance	5	476	September 22, 2020
Threads maxing out all cores, but no performance increase General Usage performance , threads	16	1822	April 6, 2021

Parallelizing for loop in the computation of a gradient

Related topics