Motivation
I was reading @jakobnissen 's blog post about optimizing Julia code and came across this interesting tip:
Do not store fields that can easily be computed on demand.
In my code, I am working on a numerical scheme whose input data is a bunch of vectors having the same length, say x
and y
. I do a bunch of deterministic calculations, and there are certain quantities such as 1 - x[i]
and x[i] * y[i]
that are used repeatedly throughout the algorithms. So, I have written my code in such a way that first the user reads x
and y
into a struct, and this inner constructor performs these preliminary calculations and adds them as fields of the struct. Something like
struct HeavyVecPair
x::Vector{Float64}
oneminusx::Vector{Float64}
twominusx::Vector{Float64}
y::Vector{Float64}
oneminusy::Vector{Float64}
twominusy::Vector{Float64}
xy::Vector{Float64}
xplusy::Vector{Float64}
function HeavyVecPair(x, y)
@assert length(x) == length(y)
new(
x,
1 .- x,
2 .- x,
y,
1 .- y,
2 .- y,
x .* y,
x + y
)
end
end
For comparison, here is one that just contains the vectors.
struct LightVecPair
x::Vector{Float64}
y::Vector{Float64}
function LightVecPair(x, y)
@assert length(x) == length(y)
new(x, y)
end
end
Benchmark
To compare these two, I wrote the following benchmark to mimic my code.
using BenchmarkTools
function funnydot(v::HeavyVecPair)
sum(
v.x[i]
* v.oneminusx[i]
* v.twominusx[i]
* v.y[i]
* v.oneminusy[i]
* v.twominusy[i]
* v.xy[i]
* v.xplusy[i]
for i in eachindex(v.x)
)
end
function funnydot(v::LightVecPair)
sum(
v.x[i]
* (1 - v.x[i])
* (2 - v.x[i])
* v.y[i]
* (1 - v.y[i])
* (2 - v.y[i])
* v.x[i] * v.y[i]
* (v.x[i] + v.y[i])
for i in eachindex(v.x)
)
end
function main()
@info "Short vectors"
display(@benchmark funnydot(v) setup=(v = HeavyVecPair(rand(10), rand(10))))
display(@benchmark funnydot(v) setup=(v = LightVecPair(rand(10), rand(10))))
@info "Long vectors"
display(@benchmark funnydot(v) setup=(v = HeavyVecPair(rand(1000), rand(1000))))
display(@benchmark funnydot(v) setup=(v = LightVecPair(rand(1000), rand(1000))))
end
main()
The results suggest that the LightVecPair
is better for both short and long vectors, as predicted by the post:
[ Info: Short vectors
BenchmarkTools.Trial: 10000 samples with 983 evaluations.
Range (min β¦ max): 58.111 ns β¦ 105.739 ns β GC (min β¦ max): 0.00% β¦ 0.00%
Time (median): 58.769 ns β GC (median): 0.00%
Time (mean Β± Ο): 59.131 ns Β± 1.654 ns β GC (mean Β± Ο): 0.00% Β± 0.00%
ββββββββββββββ β ββββ β βββ ββ β
β
βββββββββββββββββββββββββββββββββββββ
βββββββ
βββββββ
β
ββββ
βββ
β
58.1 ns Histogram: log(frequency) by time 64 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
BenchmarkTools.Trial: 10000 samples with 991 evaluations.
Range (min β¦ max): 39.406 ns β¦ 89.627 ns β GC (min β¦ max): 0.00% β¦ 0.00%
Time (median): 39.686 ns β GC (median): 0.00%
Time (mean Β± Ο): 40.134 ns Β± 1.859 ns β GC (mean Β± Ο): 0.00% Β± 0.00%
β
ββββ
β β β β β ββββ β β β
βββββββββββββββββββββββββββββββ
βββββββββββββββββββ
βββββββ
ββ β
39.4 ns Histogram: log(frequency) by time 46 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
[ Info: Long vectors
BenchmarkTools.Trial: 10000 samples with 8 evaluations.
Range (min β¦ max): 3.281 ΞΌs β¦ 9.496 ΞΌs β GC (min β¦ max): 0.00% β¦ 0.00%
Time (median): 3.459 ΞΌs β GC (median): 0.00%
Time (mean Β± Ο): 3.487 ΞΌs Β± 258.464 ns β GC (mean Β± Ο): 0.00% Β± 0.00%
βββββββββ β
βββββββββββββββ
ββ
β
ββββββββββββββββββββββββββββββββββββββ
ββ
β
β
3.28 ΞΌs Histogram: log(frequency) by time 4.65 ΞΌs <
Memory estimate: 0 bytes, allocs estimate: 0.
BenchmarkTools.Trial: 10000 samples with 9 evaluations.
Range (min β¦ max): 2.792 ΞΌs β¦ 6.521 ΞΌs β GC (min β¦ max): 0.00% β¦ 0.00%
Time (median): 2.814 ΞΌs β GC (median): 0.00%
Time (mean Β± Ο): 2.870 ΞΌs Β± 212.418 ns β GC (mean Β± Ο): 0.00% Β± 0.00%
βββββ ββ β β
ββββββββββββββββββββββββββββββββββββββ
β
ββββ
ββββββββββββββββ β
2.79 ΞΌs Histogram: log(frequency) by time 3.58 ΞΌs <
Memory estimate: 0 bytes, allocs estimate: 0.
Note that these times do not include the time that the HeavyVecPair
constructor spends performing the preliminary calculations, so in reality the performance cost with the HeavyVecPair
is even greater.
Conclusion
The advice in the blog post appears correct.
Questions
- Is this a well-designed benchmark?
- Are there any other factors in play here that I should consider before reworking the structs in my code base to be more like
LightVecPair
? - Are there any other optimizations I can make to the code above?