Situations where mutable struct is more performant than immutable struct

Wanted to highlight a situation where mutable structs offer better performance than immutable structs.

If sorting an array of structs with many isbits fields, the immutable copy cost outweighs the mutable indirection cost. It’s faster to just copy the mutable struct pointers. Benchmarks show a 2x performance gain for structs with 20 fields. Breakeven point is at about 6 fields.

Documentation recommends immutable structs for performance. Perhaps situations favoring mutable structs should also be noted.

# -------- Setup -------------
using BenchmarkTools
using Random

struct LotsOfFields 
    a::Int64
    b::Int64
    c::Int64
    d::Int64
    e::Int64
    f::Int64
    g::Int64
    h::Int64
    i::Int64
    j::Int64
    k::Int64
    l::Int64
    m::Int64
    n::Int64
    o::Int64
    p::Int64
    q::Int64
    r::Int64
    s::Int64
end

struct ImmutableBig
    a::Int64
    f::LotsOfFields
end

mutable struct MutableBig
    a::Int64
    f::LotsOfFields
end

function Random.rand(rng::AbstractRNG, ::Random.SamplerType{ImmutableBig})
    ImmutableBig( rand(rng, Int), LotsOfFields(rand(rng, Int, 19)... ))
end

function Random.rand(rng::AbstractRNG, ::Random.SamplerType{MutableBig})
    MutableBig( rand(rng, Int), LotsOfFields(rand(rng, Int, 19)... ))
end

# -------- Benchmarks ------
julia> Random.seed!(0); xs = rand(MutableBig, 100); @benchmark sort!(arr, by=(x -> x.a)) setup=(arr = copy(xs))
BenchmarkTools.Trial: 
  memory estimate:  528 bytes
  allocs estimate:  2
  --------------
  minimum time:     701.153 ns (0.00% GC)
  median time:      776.128 ns (0.00% GC)
  mean time:        852.190 ns (7.63% GC)
  maximum time:     339.048 μs (99.71% GC)
  --------------
  samples:          10000
  evals/sample:     144

julia> Random.seed!(0); xs = rand(ImmutableBig, 100); @benchmark sort!(arr, by=(x -> x.a)) setup=(arr = copy(xs))
BenchmarkTools.Trial: 
  memory estimate:  7.95 KiB
  allocs estimate:  2
  --------------
  minimum time:     1.548 μs (0.00% GC)
  median time:      1.751 μs (0.00% GC)
  mean time:        2.704 μs (33.04% GC)
  maximum time:     5.233 ms (99.80% GC)
  --------------
  samples:          10000
  evals/sample:     10
2 Likes

Your example is a little bit complex, especially the setup keyword. Do you mind to explain this syntax:

sort!(arr, by=(x -> x.a)) setup=(arr = copy(xs))

or even better, could you link to the documentation regarding setup? I didn’t found anything.

Here is a more simple example building up on yours:

julia> function swap!(x::AbstractArray)
       tmp=x[1]
       x[1]=x[2]
       x[2]=tmp
       end

julia> Random.seed!(0); xs = rand(MutableBig, 100); @benchmark swap!(xs)
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     22.730 ns (0.00% GC)
  median time:      24.564 ns (0.00% GC)
  mean time:        25.105 ns (0.00% GC)
  maximum time:     151.784 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

julia> Random.seed!(0); xs = rand(ImmutableBig, 100); @benchmark swap!(xs)
BenchmarkTools.Trial:
  memory estimate:  176 bytes
  allocs estimate:  1
  --------------
  minimum time:     61.593 ns (0.00% GC)
  median time:      64.159 ns (0.00% GC)
  mean time:        77.737 ns (11.15% GC)
  maximum time:     44.462 μs (99.80% GC)
  --------------
  samples:          10000
  evals/sample:     1000

This implies that copying ImmutableBig elements is more expensive than MutableBig elements of an array (as you already mentioned).

The BenchmarkTools quickstart page and manual have documentation for setup.

For your example, you’ll also want to interpolate xs with $ to avoid additional overhead. 3x performance difference in this case.

# --- Without interpolation
julia> Random.seed!(0); xs = rand(MutableBig, 100); @benchmark swap!(xs)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     13.416 ns (0.00% GC)
  median time:      13.754 ns (0.00% GC)
  mean time:        14.053 ns (0.00% GC)
  maximum time:     45.129 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     998

# --- With interpolation
julia> Random.seed!(0); xs = rand(MutableBig, 100); @benchmark swap!($xs)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     3.856 ns (0.00% GC)
  median time:      3.951 ns (0.00% GC)
  mean time:        4.445 ns (0.00% GC)
  maximum time:     35.397 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

# Immutable comparison with interpolation
julia> Random.seed!(0); xs = rand(ImmutableBig, 100); @benchmark swap!($xs)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     5.755 ns (0.00% GC)
  median time:      5.780 ns (0.00% GC)
  mean time:        5.934 ns (0.00% GC)
  maximum time:     39.518 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

Yes, forgot that. But still copying ImmutableBig is more expensive, but I don’t know why.
Thanks for the link to setup.

My understanding is that copying the immutable struct requires copying all of the fields, but copying the mutable struct only requires copying a pointer to the struct’s “container”.

I would be curious if you could redo the example with an array of Ref{LotsOfFields}, with immutable LosOfFields.

2 Likes

I believe using Ref is just a more tedious alternative to making LotsOfFields mutable.
It’s easy to test with a mutable LotsOfFields. This makes ImmutableBig behave more like MutableBig.

julia> Random.seed!(0); xs = rand(MutableBig, 100); @benchmark sort!(arr, by=(x -> x.a)) setup=(arr = copy(xs))
BenchmarkTools.Trial:
  memory estimate:  528 bytes
  allocs estimate:  2
  --------------
  minimum time:     671.962 ns (0.00% GC)
  median time:      678.618 ns (0.00% GC)
  mean time:        797.921 ns (5.21% GC)
  maximum time:     225.633 μs (99.64% GC)
  --------------
  samples:          10000
  evals/sample:     157

julia> Random.seed!(0); xs = rand(ImmutableBig, 100); @benchmark sort!(arr, by=(x -> x.a)) setup=(arr = copy(xs))
BenchmarkTools.Trial:
  memory estimate:  528 bytes
  allocs estimate:  2
  --------------
  minimum time:     673.392 ns (0.00% GC)
  median time:      718.771 ns (0.00% GC)
  mean time:        776.473 ns (5.24% GC)
  maximum time:     221.492 μs (99.62% GC)
  --------------
  samples:          10000
  evals/sample:     153
1 Like