I have the following problem that I don’t know how to solve while using multithreading to create a matrix. I have a function that returns the matrix element after operating over arrays that are large in size. I’m using @views to avoid copying the data.
function F(m, n, phi0, K_array, P_list)::ComplexF64 return @views ...do simple computation over phi0, K_array, P_list end
Applying F once gives (using @btime form BenchmarkTools)
55.524 μs (700 allocations: 39.84 KiB)
Now I use a machine with 100 cores to construct a matrix of size Nsite*Nsite using @Threads.threads as follows
function F_array(phi0, K_array, P_list, Nsite) result = zeros(ComplexF64, Nsite, Nsite) @Threads.threads for m in 1:Nsite for n in m+1:Nsite result[m, n] = F(m, n, phi0, K_array, P_list) end end return result end
When I apply F_array(with Nsite = 4000) I get
416.801 s (8381104705 allocations: 345.49 GiB)
which has a huge allocation that eventually leads to the computation being very slow.
I have tried a simple function instead of F that has similar run time and allocations and when I construct the matrix using this simple function it takes roughly 500 ms so I was expecting that I should get a similar time for my F_array function.
I’m suspecting that it has to do with the fact that the function F uses large arrays in its definition. Is there a way to fix this problem?
Any other insight will be greatly appreciated.