Thanks for the simplified code
I think you might be able to gain quite some performance by reusing the memory of the matrix. Assuming you can make a version of make_matrix
that constructs the matrix into a preallocated matrix, then you could try:
using ChunkSplitters
function calc_function(x)
kvalue_range = range(0, 0.5, 201)
result_vector = zeros(201)
for chunk in chunks(kvalue_range; n=Threads.nthreads())
Threads.@spawn begin
temp_matrix = zeros(1000,1000) # or whatever type/size the matrix needs to have
for k_ind in chunk
kvalue = kvalue_range[k_ind]
energy = eigvals!( make_matrix!(temp_matrix, kvalue) )
# make_matrix make a thousands x thousands matrix
# we need different matrices for each kvalues
result_vector[k_ind] = sub_calculation(energy)
end
end
end
result = integrate(result_vector)
# integrate uses the trapezoidal rule: just adding values
retun result
end
This allocates only 1 matrix for each thread. You could profile sub_calcuation
separately to make sure it does not allocate or else try to preallocate some more stuff. integrate
is probably less important as it is called much less often. But you should profile and verify that it has no large chunk of runtime.