Memory allocations and performance with multithreading

x_vals for n depends on n-1 . So, there is no trivial way to parallelize this at the outer loop.

In this particular example it is possible to change order of inner and outer loop, since there is no interaction between x_vals[i] and x_vals[j]

function compute_threaded(x0_vals, Δt, nΔt)
        x_vals = copy(x0_vals);
        
        nx = length(x0_vals);
        
        Threads.@threads for j in 1:nx
            for n in 1:nΔt
                x_vals[j] = update(x_vals[j], Δt)
            end
        end
        return x_vals
end

but this is probably not what the author wants, since we can assume that in more complicated MWE dependence between different elements exists. In this case, one can use something like Nbabel nbody integrator speed up - #30 by Skoffer but as it was said a few times already, such approaches are useful only for rather large size of vector.

2 Likes