Profiling of Multi-threaded code

Tomas_Pevny · December 9, 2016, 6:22am

Hello all,
I would like to ask for any suggestions / recommendations about profiling multi-threaded code in Julia. The single-threaded version works fine, but once I turned on multi-threading, I hardly observe any improvement. From htop I see that most of the time threads do not do a useful computation.
I have found this recent paper about profiling

but I am not sure, if that tool could be used.
A related question is, would an intel vtune profiler help?
Thank you in advance for answers.
Tomas Pevny

Tamas_Papp · December 9, 2016, 7:02am

A minimal working example would help. Lack of any improvement could indicate that the threads are waiting for some common resource, or a result of a computation that is done in a single thread, but it is hard to say without seeing the code.

ChrisRackauckas · December 9, 2016, 7:12am

Without knowing your code, it is impossible to know what’s going on. However, I do know that there’s a known issue which can occur. Have you checked to see if it’s due to an inference bug by using function barriers?

github.com/JuliaLang/julia

No performance scaling in threading

opened 07:55AM - 13 Jul 16 UTC

closed 09:18PM - 08 Oct 18 UTC

ranjanan

performance multithreading

Consider the following code: ``` julia using Base.Threads println("Number of t…hreads = $(nthreads())") x = rand(10^6) y = zeros(10^6) println("Warmup!") for i = 1:10^6 y[i] = sin(x[i])^2 + cos(x[i])^2 end t1 = @elapsed for i = 1:10^6 y[i] = sin(x[i])^2 + cos(x[i])^2 end @assert sum(y) == 10^6 t2 = @elapsed @threads for i = 1:10^6 y[i] = sin(x[i])^2 + cos(x[i])^2 end @assert sum(y) == 10^6 println("Serial time = $t1") println("Parallel time = $t2") ``` The output recorded on OSX is: ``` Number of threads = 4 Warmup! Serial time = 0.239838418 Parallel time = 0.282324893 ``` The output recorded on Linux is: ``` Number of threads = 4 Warmup! Serial time = 0.227327406 Parallel time = 0.542067206 ```

Tomas_Pevny · December 9, 2016, 7:24am

Unfortunately, I cannot provide any simple working example, since it tries to parallelise calculation of a gradient of a Neural network, and I cannot publish nor the library, nor the data.
Therefore I have been asking, if there would be a general way, how to find a bottleneck in the code. Something similar to the profiler available for a single-threaded applications.

I would be curious if Intel vTune amplifier would be of any help? I am sure that there is somewhere some bottleneck that just totally kills the parallelisation. Though, I do not know where.

ChrisRackauckas · December 9, 2016, 7:37am

Use @code_warntype first and see if there’s an inference problem. If so, you might be running into what I linked, or one of the issues linked from that issue. The workarounds are also given in that issue.

Tomas_Pevny · December 14, 2016, 2:44pm

Hi,
thanks for suggestions. I have tried code_warntype and that has passed.
I have been further playing with the threading and I have found that if set the number of threads used by openblas to one, then I see speed improvement. This seems to me that there was a problem with allocating threads on cores leading to overhead.
So on the end, I have seen about two-fold improvement. What is weird is that as the algorithm progresses, I suddenly see a drop in the speed-up, i.e. the speed will get back to that of the single-thread case and even worse.
On single-thread application, I do not see any issues like this, which makes this really hard to debug.

Topic		Replies	Views
Profiler doesn't show Threads New to Julia multithreading , profiling	6	515	February 6, 2024
Same multi-threaded code, scaling observed only on some machines Performance	2	72	August 14, 2024
Threads maxing out all cores, but no performance increase General Usage performance , threads	16	1822	April 6, 2021
Using and understanding multi-threading Performance multithreading	1	292	January 11, 2024
Running Julia with native Multithreading vs in Seperate Processes Performance	1	352	August 18, 2021

Profiling of Multi-threaded code

Related topics