## Motivation

I was reading @jakobnissen 's blog post about optimizing Julia code and came across this interesting tip:

Do not store fields that can easily be computed on demand.

In my code, I am working on a numerical scheme whose input data is a bunch of vectors having the same length, say `x`

and `y`

. I do a bunch of deterministic calculations, and there are certain quantities such as `1 - x[i]`

and `x[i] * y[i]`

that are used repeatedly throughout the algorithms. So, I have written my code in such a way that first the user reads `x`

and `y`

into a struct, and this inner constructor performs these preliminary calculations and adds them as fields of the struct. Something like

```
struct HeavyVecPair
x::Vector{Float64}
oneminusx::Vector{Float64}
twominusx::Vector{Float64}
y::Vector{Float64}
oneminusy::Vector{Float64}
twominusy::Vector{Float64}
xy::Vector{Float64}
xplusy::Vector{Float64}
function HeavyVecPair(x, y)
@assert length(x) == length(y)
new(
x,
1 .- x,
2 .- x,
y,
1 .- y,
2 .- y,
x .* y,
x + y
)
end
end
```

For comparison, here is one that just contains the vectors.

```
struct LightVecPair
x::Vector{Float64}
y::Vector{Float64}
function LightVecPair(x, y)
@assert length(x) == length(y)
new(x, y)
end
end
```

## Benchmark

To compare these two, I wrote the following benchmark to mimic my code.

```
using BenchmarkTools
function funnydot(v::HeavyVecPair)
sum(
v.x[i]
* v.oneminusx[i]
* v.twominusx[i]
* v.y[i]
* v.oneminusy[i]
* v.twominusy[i]
* v.xy[i]
* v.xplusy[i]
for i in eachindex(v.x)
)
end
function funnydot(v::LightVecPair)
sum(
v.x[i]
* (1 - v.x[i])
* (2 - v.x[i])
* v.y[i]
* (1 - v.y[i])
* (2 - v.y[i])
* v.x[i] * v.y[i]
* (v.x[i] + v.y[i])
for i in eachindex(v.x)
)
end
function main()
@info "Short vectors"
display(@benchmark funnydot(v) setup=(v = HeavyVecPair(rand(10), rand(10))))
display(@benchmark funnydot(v) setup=(v = LightVecPair(rand(10), rand(10))))
@info "Long vectors"
display(@benchmark funnydot(v) setup=(v = HeavyVecPair(rand(1000), rand(1000))))
display(@benchmark funnydot(v) setup=(v = LightVecPair(rand(1000), rand(1000))))
end
main()
```

The results suggest that the `LightVecPair`

is better for both short and long vectors, as predicted by the post:

```
[ Info: Short vectors
BenchmarkTools.Trial: 10000 samples with 983 evaluations.
Range (min β¦ max): 58.111 ns β¦ 105.739 ns β GC (min β¦ max): 0.00% β¦ 0.00%
Time (median): 58.769 ns β GC (median): 0.00%
Time (mean Β± Ο): 59.131 ns Β± 1.654 ns β GC (mean Β± Ο): 0.00% Β± 0.00%
ββββββββββββββ β ββββ β βββ ββ β
β
βββββββββββββββββββββββββββββββββββββ
βββββββ
βββββββ
β
ββββ
βββ
β
58.1 ns Histogram: log(frequency) by time 64 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
BenchmarkTools.Trial: 10000 samples with 991 evaluations.
Range (min β¦ max): 39.406 ns β¦ 89.627 ns β GC (min β¦ max): 0.00% β¦ 0.00%
Time (median): 39.686 ns β GC (median): 0.00%
Time (mean Β± Ο): 40.134 ns Β± 1.859 ns β GC (mean Β± Ο): 0.00% Β± 0.00%
β
ββββ
β β β β β ββββ β β β
βββββββββββββββββββββββββββββββ
βββββββββββββββββββ
βββββββ
ββ β
39.4 ns Histogram: log(frequency) by time 46 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
[ Info: Long vectors
BenchmarkTools.Trial: 10000 samples with 8 evaluations.
Range (min β¦ max): 3.281 ΞΌs β¦ 9.496 ΞΌs β GC (min β¦ max): 0.00% β¦ 0.00%
Time (median): 3.459 ΞΌs β GC (median): 0.00%
Time (mean Β± Ο): 3.487 ΞΌs Β± 258.464 ns β GC (mean Β± Ο): 0.00% Β± 0.00%
βββββββββ β
βββββββββββββββ
ββ
β
ββββββββββββββββββββββββββββββββββββββ
ββ
β
β
3.28 ΞΌs Histogram: log(frequency) by time 4.65 ΞΌs <
Memory estimate: 0 bytes, allocs estimate: 0.
BenchmarkTools.Trial: 10000 samples with 9 evaluations.
Range (min β¦ max): 2.792 ΞΌs β¦ 6.521 ΞΌs β GC (min β¦ max): 0.00% β¦ 0.00%
Time (median): 2.814 ΞΌs β GC (median): 0.00%
Time (mean Β± Ο): 2.870 ΞΌs Β± 212.418 ns β GC (mean Β± Ο): 0.00% Β± 0.00%
βββββ ββ β β
ββββββββββββββββββββββββββββββββββββββ
β
ββββ
ββββββββββββββββ β
2.79 ΞΌs Histogram: log(frequency) by time 3.58 ΞΌs <
Memory estimate: 0 bytes, allocs estimate: 0.
```

Note that these times do not include the time that the `HeavyVecPair`

constructor spends performing the preliminary calculations, so in reality the performance cost with the `HeavyVecPair`

is even greater.

## Conclusion

The advice in the blog post appears correct.

## Questions

- Is this a well-designed benchmark?
- Are there any other factors in play here that I should consider before reworking the structs in my code base to be more like
`LightVecPair`

? - Are there any other optimizations I can make to the code above?