milesf
October 4, 2019, 11:13pm
1
Wanted to highlight a situation where mutable structs offer better performance than immutable structs.
If sorting an array of structs with many isbits
fields, the immutable copy cost outweighs the mutable indirection cost. It’s faster to just copy the mutable struct pointers. Benchmarks show a 2x performance gain for structs with 20 fields. Breakeven point is at about 6 fields.
Documentation recommends immutable structs for performance. Perhaps situations favoring mutable structs should also be noted.
# -------- Setup -------------
using BenchmarkTools
using Random
struct LotsOfFields
a::Int64
b::Int64
c::Int64
d::Int64
e::Int64
f::Int64
g::Int64
h::Int64
i::Int64
j::Int64
k::Int64
l::Int64
m::Int64
n::Int64
o::Int64
p::Int64
q::Int64
r::Int64
s::Int64
end
struct ImmutableBig
a::Int64
f::LotsOfFields
end
mutable struct MutableBig
a::Int64
f::LotsOfFields
end
function Random.rand(rng::AbstractRNG, ::Random.SamplerType{ImmutableBig})
ImmutableBig( rand(rng, Int), LotsOfFields(rand(rng, Int, 19)... ))
end
function Random.rand(rng::AbstractRNG, ::Random.SamplerType{MutableBig})
MutableBig( rand(rng, Int), LotsOfFields(rand(rng, Int, 19)... ))
end
# -------- Benchmarks ------
julia> Random.seed!(0); xs = rand(MutableBig, 100); @benchmark sort!(arr, by=(x -> x.a)) setup=(arr = copy(xs))
BenchmarkTools.Trial:
memory estimate: 528 bytes
allocs estimate: 2
--------------
minimum time: 701.153 ns (0.00% GC)
median time: 776.128 ns (0.00% GC)
mean time: 852.190 ns (7.63% GC)
maximum time: 339.048 μs (99.71% GC)
--------------
samples: 10000
evals/sample: 144
julia> Random.seed!(0); xs = rand(ImmutableBig, 100); @benchmark sort!(arr, by=(x -> x.a)) setup=(arr = copy(xs))
BenchmarkTools.Trial:
memory estimate: 7.95 KiB
allocs estimate: 2
--------------
minimum time: 1.548 μs (0.00% GC)
median time: 1.751 μs (0.00% GC)
mean time: 2.704 μs (33.04% GC)
maximum time: 5.233 ms (99.80% GC)
--------------
samples: 10000
evals/sample: 10
2 Likes
oheil
October 5, 2019, 11:39am
2
Your example is a little bit complex, especially the setup
keyword. Do you mind to explain this syntax:
sort!(arr, by=(x -> x.a)) setup=(arr = copy(xs))
or even better, could you link to the documentation regarding setup
? I didn’t found anything.
Here is a more simple example building up on yours:
julia> function swap!(x::AbstractArray)
tmp=x[1]
x[1]=x[2]
x[2]=tmp
end
julia> Random.seed!(0); xs = rand(MutableBig, 100); @benchmark swap!(xs)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 22.730 ns (0.00% GC)
median time: 24.564 ns (0.00% GC)
mean time: 25.105 ns (0.00% GC)
maximum time: 151.784 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 1000
julia> Random.seed!(0); xs = rand(ImmutableBig, 100); @benchmark swap!(xs)
BenchmarkTools.Trial:
memory estimate: 176 bytes
allocs estimate: 1
--------------
minimum time: 61.593 ns (0.00% GC)
median time: 64.159 ns (0.00% GC)
mean time: 77.737 ns (11.15% GC)
maximum time: 44.462 μs (99.80% GC)
--------------
samples: 10000
evals/sample: 1000
This implies that copying ImmutableBig elements is more expensive than MutableBig elements of an array (as you already mentioned).
milesf
October 5, 2019, 6:41pm
3
The BenchmarkTools quickstart page and manual have documentation for setup
.
For your example, you’ll also want to interpolate xs
with $
to avoid additional overhead. 3x performance difference in this case.
# --- Without interpolation
julia> Random.seed!(0); xs = rand(MutableBig, 100); @benchmark swap!(xs)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 13.416 ns (0.00% GC)
median time: 13.754 ns (0.00% GC)
mean time: 14.053 ns (0.00% GC)
maximum time: 45.129 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 998
# --- With interpolation
julia> Random.seed!(0); xs = rand(MutableBig, 100); @benchmark swap!($xs)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 3.856 ns (0.00% GC)
median time: 3.951 ns (0.00% GC)
mean time: 4.445 ns (0.00% GC)
maximum time: 35.397 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 1000
# Immutable comparison with interpolation
julia> Random.seed!(0); xs = rand(ImmutableBig, 100); @benchmark swap!($xs)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 5.755 ns (0.00% GC)
median time: 5.780 ns (0.00% GC)
mean time: 5.934 ns (0.00% GC)
maximum time: 39.518 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 1000
oheil
October 5, 2019, 7:21pm
4
Yes, forgot that. But still copying ImmutableBig is more expensive, but I don’t know why.
Thanks for the link to setup.
milesf
October 5, 2019, 7:38pm
5
My understanding is that copying the immutable struct requires copying all of the fields, but copying the mutable struct only requires copying a pointer to the struct’s “container”.
I would be curious if you could redo the example with an array of Ref{LotsOfFields}
, with immutable LosOfFields
.
2 Likes
milesf
October 5, 2019, 8:44pm
7
I believe using Ref
is just a more tedious alternative to making LotsOfFields
mutable.
It’s easy to test with a mutable LotsOfFields
. This makes ImmutableBig
behave more like MutableBig
.
julia> Random.seed!(0); xs = rand(MutableBig, 100); @benchmark sort!(arr, by=(x -> x.a)) setup=(arr = copy(xs))
BenchmarkTools.Trial:
memory estimate: 528 bytes
allocs estimate: 2
--------------
minimum time: 671.962 ns (0.00% GC)
median time: 678.618 ns (0.00% GC)
mean time: 797.921 ns (5.21% GC)
maximum time: 225.633 μs (99.64% GC)
--------------
samples: 10000
evals/sample: 157
julia> Random.seed!(0); xs = rand(ImmutableBig, 100); @benchmark sort!(arr, by=(x -> x.a)) setup=(arr = copy(xs))
BenchmarkTools.Trial:
memory estimate: 528 bytes
allocs estimate: 2
--------------
minimum time: 673.392 ns (0.00% GC)
median time: 718.771 ns (0.00% GC)
mean time: 776.473 ns (5.24% GC)
maximum time: 221.492 μs (99.62% GC)
--------------
samples: 10000
evals/sample: 153
1 Like