I was fiddling around with the super fun advent of code this year and got surprised by an allocation in the day 5 puzzle.
input = [3,4,3,1,2]
histogram = [count(==(i),input) for i in 0:8]
phase(day) = mod(day-1,9)+1
@allocated for day in 1:256
histogram[phase(day-2)] += histogram[phase(day)]
end
sum(histogram)
@time says there are 533 allocations (8.328 KiB) in the loop. Why is that? Or have I just benchmarked poorly?
Did you pass inputs and histograms as arguments to your function, or did you access them as global variables inside that function? This makes a big difference (the latter is much less performant), as also discussed in the page that I linked
For quick stuff like this, you can use let blocks, or wrap everything in functions, like:
let
#stuff
end
function main()
#stuff
end
main() # call function
The later is very nice if you use Revise and include the file with includet, such that in the REPL you can continuously just run julia> main() while changing the content of the function.
julia> using BenchmarkTools
julia> @ballocated let
input = [3,4,3,1,2]
histogram = [count(==(i),input) for i in 0:8]
phase(day) = mod(day-1,9)+1
for day in 1:256
histogram[phase(day-2)] += histogram[phase(day)]
end
sum(histogram)
end
224
everthing in a function
julia> function main()
input = [3,4,3,1,2]
histogram = [count(==(i),input) for i in 0:8]
phase(day) = mod(day-1,9)+1
for day in 1:256
histogram[phase(day-2)] += histogram[phase(day)]
end
sum(histogram)
end
@ballocated main()
224
the data outside the function, but as parameters:
julia> input = [3,4,3,1,2]
histogram = [count(==(i),input) for i in 0:8]
phase(day) = mod(day-1,9)+1
function main(input,histogram)
for day in 1:256
histogram[phase(day-2)] += histogram[phase(day)]
end
sum(histogram)
end
@ballocated main($input,$histogram) # interpolate the input parameters!
0
always using @ballocated from BenchmarkTools, to avoid computing compilation, etc.
@allocated does not need that, but it has its quirks. First, it will count all stuff associated to the compilation on the first call to the function. Second, it may report allocations associated to the returning of the value from the function to the REPL. Thus, for example:
julia> f(x) = sum(x)
f (generic function with 1 method)
julia> x = rand(10);
julia> @allocated f(x)
4026434
julia> @allocated f(x)
16
On the first call it counted everything associated to compilation. In the second call it still reports 16 allocations, which are associated to the return value to the REPL. These are correctly discounted with @ballocated (properly used, with the variable interpolated):
I like to use TimerOutputs to track allocations in cases like this one. Here is how it could be used on your example:
using TimerOutputs
# global TimerOutput object
const to = TimerOutput()
# wrap everything in a function (benchmarking in the global scope is tricky)
function foo(input)
phase(day) = mod(day-1,9)+1
# annotate lines or code sections with @timeit
@timeit to "histogram" histogram = [count(==(i),input) for i in 0:8]
@timeit to "for" for day in 1:256
histogram[phase(day-2)] += histogram[phase(day)]
end
@timeit to "sum" sum(histogram)
end
# Make sure everything is compiled
input = [3,4,3,1,2]
foo(input)
# Reset the timer and re-run to get meaningful results
reset_timer!(to)
foo(input)
print_timer(to)
The print_timer invocation above yields an output like