Large Garbage Collection in for loop if slicing vector

When creating a slice of a vector in a for loop, it seems like Julia creates a new vector each time which then needs to be dealt with by the garbage collector. In my setting, sometimes the GC can significantly slow down the program. As a MWE:

function f(data::Vector{Float64}; n=1000)
    out = fill(0.0, n)
    for i in 1:n
        r = rand(1:length(data)-101)
        x = data[r:r+100]
        out[i] = sum(x)
    end
    out
end
@benchmark f_vec(data) setup=(data=randn(10000))
#  BenchmarkTools.Trial: 10000 samples with 1 evaluation.
#  Range (min … max):  103.800 ΞΌs …  1.461 ms  β”Š GC (min … max): 0.00% … 80.05%
#  Time  (median):     111.400 ΞΌs              β”Š GC (median):    0.00%
#  Time  (mean Β± Οƒ):   131.199 ΞΌs Β± 80.903 ΞΌs  β”Š GC (mean Β± Οƒ):  6.53% Β± 10.09%

#   β–ˆβ–‡β–„β–ƒβ–„β–‚β–‚β– ▁▁                                                  ▁
#   β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‡β–ˆβ–†β–†β–‡β–†β–…β–…β–„β–…β–…β–β–ƒβ–…β–ƒβ–ƒβ–„β–β–β–β–ƒβ–β–β–β–β–β–β–β–β–β–β–β–β–ƒβ–…β–…β–‡β–‡β–†β–†β–…β–†β–†β–† β–ˆ
#   104 ΞΌs        Histogram: log(frequency) by time       584 ΞΌs <

#  Memory estimate: 882.94 KiB, allocs estimate: 1001.

The median of that timing is fine, however, the max time with 80% GC creates some problems for me. For example, if I change to n=10_000_000, then it takes 30 seconds to run instead of the 1 second implied by the median time. I can also fix this by changing to x = @view data[r:r+100]. However, in my more specific setting I cannot guarantee that it will always be continuous slices (so it might be a random set of indices).

So my main question is, why does Julia create a new vector each time for x and then need to collect it? Is there a way to specify that I want to overwrite the same place in memory? Would that help? Is there another way to solve this?

Generally, pass an iterator to sum.

You can use a view even with arbitrary indices.

Like adding a β€œbuffer” argument?

julia> function g(data::Vector{Float64}, buffer::Vector{Float64}; n=1000)
           out = fill(0.0, n)
           for i in 1:n
               r = rand(1:length(data)-101)
               buffer .= @view data[r:r+100]
               out[i] = sum(buffer)
           end
           out
       end
g (generic function with 1 method)

julia> @benchmark g(data, buffer) setup=(data=randn(10000); buffer=Vector{Float64}(undef, 101))
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  55.861 ΞΌs … 165.260 ΞΌs  β”Š GC (min … max): 0.00% … 0.00%
 Time  (median):     57.117 ΞΌs               β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   60.518 ΞΌs Β±   6.547 ΞΌs  β”Š GC (mean Β± Οƒ):  0.00% Β± 0.00%

  β–„β–ˆβ–†β–‚β–ƒβ–„β–„β–„β–„β–ƒ             ▁▅▅▂                                  ▁
  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–†β–†β–†β–†β–…β–†β–†β–†β–†β–†β–‡β–†β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‡β–†β–†β–†β–†β–…β–†β–„β–…β–…β–†β–†β–†β–†β–†β–†β–…β–…β–†β–„β–†β–…β–…β–†β–…β–†β–†β–„β–†β–… β–ˆ
  55.9 ΞΌs       Histogram: log(frequency) by time      86.1 ΞΌs <

 Memory estimate: 7.94 KiB, allocs estimate: 1.

But just using x = @view ... is probably simpler.

Yes, just as I understand it, sometimes doing so can be slower since the slice is not continuous in memory. Though, in my setting, it might be worth it to avoid GC.

I was thinking more like:

function f(data::Vector{Float64}; n=1000)
    out = fill(0.0, n)
    x = Vector{Float64}(undef, 101)
    for i in 1:n
        r = rand(1:length(data)-101)
        x .= data[r:r+100]
        out[i] = sum(x)
    end
    out
end

But that doesn’t result in any differences.

Is there any benefit in passing buffer in yours?

Only in case you want to reuse the same buffer in other functions, if you use this pattern in different places. Otherwise not really. And I’m still not sure it’s any better than just using x = @view ....

2 Likes

In this case, sum is more of an example than what I am actually trying to do.

This still allocates a copy of the data slice on the right-hand-side before writing it to x.

1 Like

If you must collect the data into a buffer, then

buffer .= @view data[inds] 

is the way to go. There’s normally no cost to creating the view, this should just directly retrieve the data points into the buffer.

But if you subsequently iterate just a single time over the buffer, like for example with sum(buffer), then this is just a waste of time, and directly summing the view sum(@view data[inds]) is as good as it gets. Then you just traverse the data a single time, summing as you go. No need to traverse the data twice if once will do.

If the indices are non-contiguous and you will access them multiple times, collecting them in a buffer might be worth it.

Based on your example, a view seems to be exactly what you want.

4 Likes

PS. If you have an array of indices (instead of a range), it might be worthwhile to forgo the index array entirely, and just compute the desired indices/elements one by one in a loop (either to store into a buffer for subsequent calculations, or to directly calculate something like a sum with them). There’s no need to bend over backwards to avoid writing your own performance-critical inner loops.

1 Like