Timeseries Optimization : Reducing allocations

EdgarHR · May 16, 2021, 5:02pm

Hello everyone!

I was working on a project and wanted to start reformatting the code to make it faster. As part of this process I am working on reducing the amount of allocations as much as possible. Currently I have a function that takes in a vector and another function, myfunc. Then inside a for loop I call myfunc on a subset of the input vector. After playing around with benchmark tools I noticed that allocations are being made when calling myfunc and inside myfunc. My is structured as follows,

function eval(timeseries::Vector{Float64}, myfunc)
    for i in 1:length(timeseries)
        alpha = myfunc(timeseries[1:i])
    end
end

I tried running it using myfunc = rand(1)[1] and was surprised by how many allocations were made. I guessing there is no way to reduce the myfunc random integer allocations, but maybe there is a way to reduce the allocations from myfunc(timeseries[1:i]). I’m new to this whole optimization stuff, so maybe what I’m trying to do is simply impossible.

Anyway, thank you for taking the time to read this! Any feedback about optimization tips or how I might restructure my code would be greatly appreciated!

affans · May 16, 2021, 5:22pm

If the allocations are in myfunc you’d need to post some code for folks to try to figure out whats happening.

EdgarHR · May 16, 2021, 5:35pm

Yeah for sure. I was testing out with myfunc equal to random(timeseries) = rand(1)[1]. In the future i would like to replace myfunc with some arbitrary function so I was wondering if there was a better way to re-write my code to avoid the allocations from myfunc(timeseries[1:i])

rdeits · May 16, 2021, 5:43pm

This statement allocates a copy of that subset of timeseries, which explains at least some of your allocation issues. Try @view timeseries[1:i] to create a lightweight view instead, and see Performance Tips · The Julia Language for more info.

DaymondLing · May 16, 2021, 5:52pm

Here are two things to keep in mind when indexing (since Julia 1.5) and looping:

normal indexing v[i,j] copies data (allocation), use views @view v[i,j] to create view to underlying elements (no allocation). Use @views for a block of code instead of a single indexing operation.
normal loops for i=1:j generates code with bounds checking. If you are sure your code will always stay inbound, use @inbounds for i=1:j

Using a dummy function,

ts = rand(100)

function test1(ts)
    for i = 1:length(ts)
        dummy(ts[1:i])
    end
end

function test2(ts)
    for i = 1:length(ts)
        dummy(@view ts[1:i])
    end
end

function test3(ts)
    @inbounds for i = 1:length(ts)
        dummy(@view ts[1:i])
    end
end

function dummy(ts)
end

test1 shows 100 allocations

julia> @benchmark test1($ts)
BenchmarkTools.Trial: 
  memory estimate:  49.06 KiB
  allocs estimate:  100
  --------------
  minimum time:     6.150 μs (0.00% GC)
  median time:      8.975 μs (0.00% GC)
  mean time:        9.879 μs (7.94% GC)
  maximum time:     203.525 μs (90.07% GC)
  --------------
  samples:          10000
  evals/sample:     4

test2 has 0 allocations

julia> @benchmark test2($ts)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     65.000 ns (0.00% GC)
  median time:      65.102 ns (0.00% GC)
  mean time:        65.300 ns (0.00% GC)
  maximum time:     79.490 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     980

test3 has 0 allocations and is faster than test2

julia> @benchmark test3($ts)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     2.000 ns (0.00% GC)
  median time:      2.100 ns (0.00% GC)
  mean time:        2.130 ns (0.00% GC)
  maximum time:     25.200 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

The best source of these tips is Julia documentation performance tips.

rdeits · May 16, 2021, 6:00pm

All good points, but note that those benchmarks are somewhat misleading–in particular, your last example gives a 2ns result for iteration over 100 elements, which would mean each iteration takes about 1/10 of a clock cycle. That’s probably not what’s actually happening–more likely the compiler has optimized the entire loop into nothing, since it doesn’t actually do anything. If you increase the length of ts, you’ll notice that the runtime is constant, further demonstrating that the compiler has defeated the benchmark.

@view and @inbounds are still good tools, but @inbounds won’t generally improve your actual code speed by a factor of 30.

EdgarHR · May 16, 2021, 6:01pm

Thank you for your suggestion rdeits! I’ll be sure to do that. Based on what I’ve been reading it seems that I’ll also have to be careful that myfunc doesnt modify timeseries in the future.

EdgarHR · May 16, 2021, 6:05pm

Thank you @DaymondLing for your suggestions and again @rdeits for the for the followup! Another technique I used was @code_warntype to make sure my types were stable.

jebej · May 16, 2021, 6:49pm

If you are passing a function as an argument, you could try forcing specialization with a type parameter.

DaymondLing · May 16, 2021, 7:48pm

Yes, of course, the code merely shows the compiler removes bounds check and could result in some time savings, YMMV obviously.

EdgarHR · May 18, 2021, 6:25am

Thank you @jebej! I’ll take a look into it

Topic		Replies	Views
Optimization of array views Performance	6	618	August 8, 2018
Reducing allocations and run time General Usage	8	393	February 12, 2021
Is there a way to guarantee 0 allocations when accessing an array? General Usage	18	4306	June 1, 2017
Problem with creating view of vectors in vector of vectors (it keeps allocating) Performance memory-allocation , vector	2	129	July 9, 2024
Why these two functions have different allocations? General Usage	11	420	July 13, 2021

Timeseries Optimization : Reducing allocations

Related topics