Hi all. New Julia user here. I want to significantly increase performance/reduce the amount of memory allocated in a function similar to the MWE below. Basically I need to access the elements of an array (sliced using a matrix), multiply those elements by another matrix, and then sum the columns of the result.
using BenchmarkTools n = 300 coefficients = rand(n^2, 4); state = ones(n); randidxs = rand(1:n, n^2, 4); result = zeros(n^2); function viewmultsum!(result, coefficients, state, randidxs) @views sum!(result, coefficients .* state[randidxs]) end; @benchmark viewmultsum!(result, coefficients, state, randidxs)
BenchmarkTools.Trial: 5263 samples with 1 evaluation. Range (min … max): 581.567 μs … 9.461 ms ┊ GC (min … max): 0.00% … 90.38% Time (median): 760.413 μs ┊ GC (median): 0.00% Time (mean ± σ): 941.917 μs ± 660.843 μs ┊ GC (mean ± σ): 17.49% ± 18.97% ▃▂▅█▆▅▃▂▁ ▁▁ ▁ ▁ ██████████▇▇▆▄▃▄▁▃▃▁▃▃▁▁▁▁▃▁▁▃▃▃▁▁▁▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▅▇█████▇▇▇▆ █ 582 μs Histogram: log(frequency) by time 3.53 ms < Memory estimate: 2.75 MiB, allocs estimate: 2.
I’ll need to call a function similar to this several thousand times while integrating a very large system of ODES (n > 1e4) so it is performance critical. I thought using
@views and summing inplace with
sum! would mean that I don’t have to allocate much memory. I only have 2 allocations in the MWE but they seem quite large and I don’t know exactly where they come from.
I know accessing the elements of
state in a random manner is not ideal for cache performance, but in the real ODE system there is no structured way I can access its elements. In short the poor cache performance may be unavoidable.
Is there any way I can reduce the allocations in the MWE? Or is the best bet for performance improvement to look at parallelization using hardware like GPUs? Any other performance tips/pointers are much appreciated.