When creating a slice of a vector in a for loop, it seems like Julia creates a new vector each time which then needs to be dealt with by the garbage collector. In my setting, sometimes the GC can significantly slow down the program. As a MWE:
function f(data::Vector{Float64}; n=1000)
out = fill(0.0, n)
for i in 1:n
r = rand(1:length(data)-101)
x = data[r:r+100]
out[i] = sum(x)
end
out
end
@benchmark f_vec(data) setup=(data=randn(10000))
# BenchmarkTools.Trial: 10000 samples with 1 evaluation.
# Range (min β¦ max): 103.800 ΞΌs β¦ 1.461 ms β GC (min β¦ max): 0.00% β¦ 80.05%
# Time (median): 111.400 ΞΌs β GC (median): 0.00%
# Time (mean Β± Ο): 131.199 ΞΌs Β± 80.903 ΞΌs β GC (mean Β± Ο): 6.53% Β± 10.09%
# ββββββββ ββ β
# βββββββββββββββββββββββ β ββ β βββ βββββββββββββββββββββ β βββββ βββ β
# 104 ΞΌs Histogram: log(frequency) by time 584 ΞΌs <
# Memory estimate: 882.94 KiB, allocs estimate: 1001.
The median of that timing is fine, however, the max time with 80% GC creates some problems for me. For example, if I change to n=10_000_000, then it takes 30 seconds to run instead of the 1 second implied by the median time. I can also fix this by changing to x = @view data[r:r+100]. However, in my more specific setting I cannot guarantee that it will always be continuous slices (so it might be a random set of indices).
So my main question is, why does Julia create a new vector each time for x and then need to collect it? Is there a way to specify that I want to overwrite the same place in memory? Would that help? Is there another way to solve this?
Yes, just as I understand it, sometimes doing so can be slower since the slice is not continuous in memory. Though, in my setting, it might be worth it to avoid GC.
function f(data::Vector{Float64}; n=1000)
out = fill(0.0, n)
x = Vector{Float64}(undef, 101)
for i in 1:n
r = rand(1:length(data)-101)
x .= data[r:r+100]
out[i] = sum(x)
end
out
end
Only in case you want to reuse the same buffer in other functions, if you use this pattern in different places. Otherwise not really. And Iβm still not sure itβs any better than just using x = @view ....
is the way to go. Thereβs normally no cost to creating the view, this should just directly retrieve the data points into the buffer.
But if you subsequently iterate just a single time over the buffer, like for example with sum(buffer), then this is just a waste of time, and directly summing the view sum(@view data[inds]) is as good as it gets. Then you just traverse the data a single time, summing as you go. No need to traverse the data twice if once will do.
If the indices are non-contiguous and you will access them multiple times, collecting them in a buffer might be worth it.
Based on your example, a view seems to be exactly what you want.
PS. If you have an array of indices (instead of a range), it might be worthwhile to forgo the index array entirely, and just compute the desired indices/elements one by one in a loop (either to store into a buffer for subsequent calculations, or to directly calculate something like a sum with them). Thereβs no need to bend over backwards to avoid writing your own performance-critical inner loops.