Consistent 2x Slowdown with v1.2

swishmas · October 5, 2019, 10:32pm

Hi there,

I have been working on some code (~4,000 lines) for a while, and I was excited about the upcoming changes to the parallelization in v1.3. I felt I had very efficient code on v1.1(.1), but when I run the code on v1.2 or v1.3, the code is slower by a factor of two in all instances and runs.

There’s nothing obvious on the release notes that would create this slow down. I do use parallelization (Threads.@threads) but I don’t think the change would make any difference and I still see the code is parallelized when running the code with htop in terminal.

I was wondering if I missed something from 1.1(.1) to v1.2 (or v1.3) with the parallelization or anything else that would cause a very consistent 2x slow-down. Have other people run into this? I can post a few examples in a little but I’m not even sure where to start looking in the code in the first place.

ssfrr · October 5, 2019, 11:24pm

I would start with profiling the code on both 1.1 and 1.2 to see if you can pin down where the difference is. If you use Juno the @profiler macro is great and gives a nice interactive visualization to explore the code.

-s

swishmas · October 6, 2019, 12:54am

Thank you for the response. Yes, I did profile the code (@profile, simply). I should have mentioned that. It didn’t pinpoint any function in particular that was longer by itself. Instead, a lot of functions appeared to be slower.

I chose one of the functions, placed @time around the sub-functions and found the allocations on a parallelized loop (using a lot of let statements and Threads.@threads) was tossing out a lot more allocations that it should have on v1.3.

Checking with @code_warntype didn’t show any type instability and the single function being called inside the loop showed no allocations. So, I figured it must be the parallel loop or something much deeper.

Edit 1: Commenting out the let statements didn’t help. I added the let into the loop in a previous version and it got rid of several type stability issues and made overall faster code with less allocations.

Edit 2: I actually tested between v1.1.1 and v1.3 if that makes any difference.

Ralph_Smith · October 6, 2019, 1:17am

https://github.com/JuliaLang/julia/issues/32701

The upshot is that after v1.1 the multithreading infrastructure was radically overhauled, and for v1.3 we should look for correctness rather than speed.

swishmas · October 6, 2019, 1:41am

Oh my…what a surprise!

Ok, this puts my mind at ease. Thank you!

JesseSantoso · October 10, 2019, 5:07am

I’ve had a very similar issue. My code relies on a parallelised inner loop which then gets called a bunch of times in a serialised outer loop. The threading overhead leads to a roughly 5-10x slowdown between 1.1 and 1.2/1.3 rc for my code. So I guess I’m just gonna be sticking to 1.1, as there seems to be no intention to fix this for the final 1.3 release version…

jling · October 10, 2019, 5:55am

Won’t harm to try it when it comes out , also worth trying, depending on your use, is to remove the inner @threads and use @spaw – fetch, this can potentially speed up things if your inner loops don’t always finish at the same time and next outer loop doesn’t depend on a previous one. (probably uncommon)

Topic		Replies	Views
Parallelization from v1.1.1 to v1.4 Performance	0	420	March 22, 2020
Performance regression in 1.0.1 Performance	5	879	October 2, 2018
Multithreading performance regressions in 0.6? General Usage multithreading	2	1169	May 16, 2017
1.8 much slower than 1.6 General Usage	26	3553	November 28, 2022
@turbo speeds routine, slows down everything else Performance loopvectorization	16	2609	June 5, 2021

Consistent 2x Slowdown with v1.2

Related topics