I was working on a project and wanted to start reformatting the code to make it faster. As part of this process I am working on reducing the amount of allocations as much as possible. Currently I have a function that takes in a vector and another function, myfunc. Then inside a for loop I call myfunc on a subset of the input vector. After playing around with benchmark tools I noticed that allocations are being made when calling myfunc and inside myfunc. My is structured as follows,
function eval(timeseries::Vector{Float64}, myfunc)
for i in 1:length(timeseries)
alpha = myfunc(timeseries[1:i])
end
end
I tried running it using myfunc = rand(1)[1] and was surprised by how many allocations were made. I guessing there is no way to reduce the myfunc random integer allocations, but maybe there is a way to reduce the allocations from myfunc(timeseries[1:i]). I’m new to this whole optimization stuff, so maybe what I’m trying to do is simply impossible.
Anyway, thank you for taking the time to read this! Any feedback about optimization tips or how I might restructure my code would be greatly appreciated!
Yeah for sure. I was testing out with myfunc equal to random(timeseries) = rand(1)[1]. In the future i would like to replace myfunc with some arbitrary function so I was wondering if there was a better way to re-write my code to avoid the allocations from myfunc(timeseries[1:i])
This statement allocates a copy of that subset of timeseries, which explains at least some of your allocation issues. Try @view timeseries[1:i] to create a lightweight view instead, and see Performance Tips · The Julia Language for more info.
Here are two things to keep in mind when indexing (since Julia 1.5) and looping:
normal indexing v[i,j] copies data (allocation), use views @view v[i,j] to create view to underlying elements (no allocation). Use @views for a block of code instead of a single indexing operation.
normal loops for i=1:j generates code with bounds checking. If you are sure your code will always stay inbound, use @inbounds for i=1:j
Using a dummy function,
ts = rand(100)
function test1(ts)
for i = 1:length(ts)
dummy(ts[1:i])
end
end
function test2(ts)
for i = 1:length(ts)
dummy(@view ts[1:i])
end
end
function test3(ts)
@inbounds for i = 1:length(ts)
dummy(@view ts[1:i])
end
end
function dummy(ts)
end
All good points, but note that those benchmarks are somewhat misleading–in particular, your last example gives a 2ns result for iteration over 100 elements, which would mean each iteration takes about 1/10 of a clock cycle. That’s probably not what’s actually happening–more likely the compiler has optimized the entire loop into nothing, since it doesn’t actually do anything. If you increase the length of ts, you’ll notice that the runtime is constant, further demonstrating that the compiler has defeated the benchmark.
@view and @inbounds are still good tools, but @inbounds won’t generally improve your actual code speed by a factor of 30.
Thank you for your suggestion rdeits! I’ll be sure to do that. Based on what I’ve been reading it seems that I’ll also have to be careful that myfunc doesnt modify timeseries in the future.
Thank you @DaymondLing for your suggestions and again @rdeits for the for the followup! Another technique I used was @code_warntype to make sure my types were stable.