Poor performance while multithreading (Julia 1.0)

@foobar_lv2

Let me break my answer into pieces according to your questions.

For my test parameters @btime for no threads gives me 2.89 s, for 2 threads 4.11s, for 3 threads 3.92 s and for 4 threads 3.71 seconds (my machine has 4 physical cores). So although there seems to be some scaling with Threads.nthreads() I am not getting in range of the non-threaded loops.

The structs consist of 3 Arrays. Field1 has ~10^2 entries atmost, in the tests it hast like 10 entries. The other contain up to 10^7 entries, in the test its roughly a thousand each.

struct MyStruct 
Field1 :: Vector{Float64}
Field2 :: Array{Float64, 4}
Field3 :: Array{Float64, 4}
end

The computeKernel() functions compute one dimensional integrals, so inside of these I define inner functions which are passed as kernels to some integration function. In that process I allocate memory temporarily for a buffer that saves intermediate results.

Within the calculation elements of one struct are used to compute elements of the other struct, so there is in fact quite a lot of data access (so cache might be a concern?! ).

Not sure how I test if two threads write into the same cache line… Only thing I can say that the loops are definitely threadsafe, in the sense that no chunck of memory is simultaneously accessed for both reading and writing.