Timeseries Optimization : Reducing allocations

Hello everyone!

I was working on a project and wanted to start reformatting the code to make it faster. As part of this process I am working on reducing the amount of allocations as much as possible. Currently I have a function that takes in a vector and another function, myfunc. Then inside a for loop I call myfunc on a subset of the input vector. After playing around with benchmark tools I noticed that allocations are being made when calling myfunc and inside myfunc. My is structured as follows,

function eval(timeseries::Vector{Float64}, myfunc)
    for i in 1:length(timeseries)
        alpha = myfunc(timeseries[1:i])
    end
end

I tried running it using myfunc = rand(1)[1] and was surprised by how many allocations were made. I guessing there is no way to reduce the myfunc random integer allocations, but maybe there is a way to reduce the allocations from myfunc(timeseries[1:i]). I’m new to this whole optimization stuff, so maybe what I’m trying to do is simply impossible.

Anyway, thank you for taking the time to read this! Any feedback about optimization tips or how I might restructure my code would be greatly appreciated!

If the allocations are in myfunc you’d need to post some code for folks to try to figure out whats happening.

Yeah for sure. I was testing out with myfunc equal to random(timeseries) = rand(1)[1]. In the future i would like to replace myfunc with some arbitrary function so I was wondering if there was a better way to re-write my code to avoid the allocations from myfunc(timeseries[1:i])

This statement allocates a copy of that subset of timeseries, which explains at least some of your allocation issues. Try @view timeseries[1:i] to create a lightweight view instead, and see Performance Tips · The Julia Language for more info.

4 Likes

Here are two things to keep in mind when indexing (since Julia 1.5) and looping:

  1. normal indexing v[i,j] copies data (allocation), use views @view v[i,j] to create view to underlying elements (no allocation). Use @views for a block of code instead of a single indexing operation.

  2. normal loops for i=1:j generates code with bounds checking. If you are sure your code will always stay inbound, use @inbounds for i=1:j

Using a dummy function,

ts = rand(100)

function test1(ts)
    for i = 1:length(ts)
        dummy(ts[1:i])
    end
end

function test2(ts)
    for i = 1:length(ts)
        dummy(@view ts[1:i])
    end
end

function test3(ts)
    @inbounds for i = 1:length(ts)
        dummy(@view ts[1:i])
    end
end

function dummy(ts)
end

test1 shows 100 allocations

julia> @benchmark test1($ts)
BenchmarkTools.Trial: 
  memory estimate:  49.06 KiB
  allocs estimate:  100
  --------------
  minimum time:     6.150 μs (0.00% GC)
  median time:      8.975 μs (0.00% GC)
  mean time:        9.879 μs (7.94% GC)
  maximum time:     203.525 μs (90.07% GC)
  --------------
  samples:          10000
  evals/sample:     4

test2 has 0 allocations

julia> @benchmark test2($ts)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     65.000 ns (0.00% GC)
  median time:      65.102 ns (0.00% GC)
  mean time:        65.300 ns (0.00% GC)
  maximum time:     79.490 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     980

test3 has 0 allocations and is faster than test2

julia> @benchmark test3($ts)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     2.000 ns (0.00% GC)
  median time:      2.100 ns (0.00% GC)
  mean time:        2.130 ns (0.00% GC)
  maximum time:     25.200 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

The best source of these tips is Julia documentation performance tips.

1 Like

All good points, but note that those benchmarks are somewhat misleading–in particular, your last example gives a 2ns result for iteration over 100 elements, which would mean each iteration takes about 1/10 of a clock cycle. That’s probably not what’s actually happening–more likely the compiler has optimized the entire loop into nothing, since it doesn’t actually do anything. If you increase the length of ts, you’ll notice that the runtime is constant, further demonstrating that the compiler has defeated the benchmark.

@view and @inbounds are still good tools, but @inbounds won’t generally improve your actual code speed by a factor of 30.

3 Likes

Thank you for your suggestion rdeits! I’ll be sure to do that. Based on what I’ve been reading it seems that I’ll also have to be careful that myfunc doesnt modify timeseries in the future.

Thank you @DaymondLing for your suggestions and again @rdeits for the for the followup! Another technique I used was @code_warntype to make sure my types were stable.

If you are passing a function as an argument, you could try forcing specialization with a type parameter.

1 Like

Yes, of course, the code merely shows the compiler removes bounds check and could result in some time savings, YMMV obviously.

Thank you @jebej! I’ll take a look into it