Poor performance while multithreading (Julia 1.0)

dkiese · February 2, 2019, 11:00pm

Let me break my answer into pieces according to your questions.

For my test parameters @btime for no threads gives me 2.89 s, for 2 threads 4.11s, for 3 threads 3.92 s and for 4 threads 3.71 seconds (my machine has 4 physical cores). So although there seems to be some scaling with Threads.nthreads() I am not getting in range of the non-threaded loops.

The structs consist of 3 Arrays. Field1 has ~10^2 entries atmost, in the tests it hast like 10 entries. The other contain up to 10^7 entries, in the test its roughly a thousand each.

struct MyStruct 
Field1 :: Vector{Float64}
Field2 :: Array{Float64, 4}
Field3 :: Array{Float64, 4}
end

The computeKernel() functions compute one dimensional integrals, so inside of these I define inner functions which are passed as kernels to some integration function. In that process I allocate memory temporarily for a buffer that saves intermediate results.

Within the calculation elements of one struct are used to compute elements of the other struct, so there is in fact quite a lot of data access (so cache might be a concern?! ).

Not sure how I test if two threads write into the same cache line… Only thing I can say that the loops are definitely threadsafe, in the sense that no chunck of memory is simultaneously accessed for both reading and writing.

Topic		Replies	Views
Threads.@threads parallelism and htop Performance multithreading , threads	2	454	May 25, 2021
Parellelization for large ODEs, Multithreading fails Performance performance , multithreading , differentialequation	16	322	March 30, 2023
A question about parallel performance in multithreading Performance question , performance , multithreading	10	652	May 12, 2022
Threads.@threads does not work properly General Usage	6	298	July 7, 2024
Same multi-threaded code, scaling observed only on some machines Performance	2	72	August 14, 2024

Poor performance while multithreading (Julia 1.0)

Related topics