I have been working on some code (~4,000 lines) for a while, and I was excited about the upcoming changes to the parallelization in v1.3. I felt I had very efficient code on v1.1(.1), but when I run the code on v1.2 or v1.3, the code is slower by a factor of two in all instances and runs.
There’s nothing obvious on the release notes that would create this slow down. I do use parallelization (Threads.@threads) but I don’t think the change would make any difference and I still see the code is parallelized when running the code with htop in terminal.
I was wondering if I missed something from 1.1(.1) to v1.2 (or v1.3) with the parallelization or anything else that would cause a very consistent 2x slow-down. Have other people run into this? I can post a few examples in a little but I’m not even sure where to start looking in the code in the first place.
I would start with profiling the code on both 1.1 and 1.2 to see if you can pin down where the difference is. If you use Juno the @profiler macro is great and gives a nice interactive visualization to explore the code.
Thank you for the response. Yes, I did profile the code (@profile, simply). I should have mentioned that. It didn’t pinpoint any function in particular that was longer by itself. Instead, a lot of functions appeared to be slower.
I chose one of the functions, placed @time around the sub-functions and found the allocations on a parallelized loop (using a lot of let statements and Threads.@threads) was tossing out a lot more allocations that it should have on v1.3.
Checking with @code_warntype didn’t show any type instability and the single function being called inside the loop showed no allocations. So, I figured it must be the parallel loop or something much deeper.
Edit 1: Commenting out the let statements didn’t help. I added the let into the loop in a previous version and it got rid of several type stability issues and made overall faster code with less allocations.
Edit 2: I actually tested between v1.1.1 and v1.3 if that makes any difference.
The upshot is that after v1.1 the multithreading infrastructure was radically overhauled, and for v1.3 we should look for correctness rather than speed.
I’ve had a very similar issue. My code relies on a parallelised inner loop which then gets called a bunch of times in a serialised outer loop. The threading overhead leads to a roughly 5-10x slowdown between 1.1 and 1.2/1.3 rc for my code. So I guess I’m just gonna be sticking to 1.1, as there seems to be no intention to fix this for the final 1.3 release version…
Won’t harm to try it when it comes out , also worth trying, depending on your use, is to remove the inner @threads and use @spaw – fetch, this can potentially speed up things if your inner loops don’t always finish at the same time and next outer loop doesn’t depend on a previous one. (probably uncommon)