Container of mutable structs without 100x slowdown?

It seems like when mutable structs are stored in any kind of container, Julia is unable to optimize them onto the stack or even to a packed buffer on the heap. It always allocates new memory on the heap for each individual struct.

Is there any alternative when I have a container of structs and need to be able to mutate some values?

struct S_i
  x::Int
end

mutable struct S_m
  x::Int
end

function f(v::Vector{T}) where T
  s = 0
  for i in 1:10^7
    v[1] = T(i)
    s += v[1].x
  end
  s
end
julia> @btime f([S_i(1)])
  398.516 ÎĽs (1 allocation: 64 bytes)
50000005000000

julia> @btime f([S_m(1)])
  46.361 ms (10000002 allocations: 152.59 MiB)
50000005000000

What you’re measuring here is essentially the cost of creating a new T instance at each iteration, a cost that is negligible for immutable structures, and large for mutable structures. But presumably you want your structure to be mutable because you want to mutate it at each iteration, instead of creating a new instance at each iteration.

Let’s take your example, modified in such a way that the structures have two fields (so that it is easier to see the differences between mutation and instance creation:

struct S_i
    x :: Int
    y :: Int
end

function f!(v::Vector{T}) where T
    s = 0
    for i in 1:10^7
        v[1] = T(i, v[1].y)
        s += v[1].x
    end
    s
end

This performs well for a vector of immutable structures, in that it does not allocate:

julia> @btime f(v_i)  setup=(v_i=[S_i(1, 42)]) evals=1
  2.452 ms (0 allocations: 0 bytes)
50000005000000

Of course, you can use the exact same function with a mutable struct, but it allocates one new object for each iteration, and is therefore very inefficient:

julia> mutable struct S_m
           x :: Int
           y :: Int
       end

julia> @btime f!(v_m)  setup=(v_m=[S_m(1, 42)]) evals=1
  44.628 ms (10000000 allocations: 305.18 MiB)
50000005000000

Fortunately, in this case, you can mutate the objects that are already stored in the vector:

function f_mut!(v)
    s = 0
    for i in 1:10^7
        v[1].x = i  # nicer syntax: we only want to mutate `x`; `y` is left alone
        s += v[1].x
    end
    s
end

And it again avoids all allocations:

julia> @btime f_mut!(v_m)  setup=(v_m=[S_m(1, 42)]) evals=1
  3.469 ms (0 allocations: 0 bytes)
50000005000000

Note that we could have had a nice enough syntax in the immutable case, where we would only have to tell what to modify, and the unchanged fields would be taken care of automatically. Accessors.jl is one way of doing this:

using Accessors
function f_accessors!(v)
    s = 0
    field_x = @optic _.x
    for i in 1:10^7
        v[1] = set(v[1], field_x, i)
        s += v[1].x
    end
    s
end
julia> @btime f_accessors!(v_i)  setup=(v_i=[S_i(1, 42)]) evals=1
  3.968 ms (0 allocations: 0 bytes)
50000005000000

Note that all non-allocating versions have performances in the same ballpark. I wouldn’t try to look too much into the small performance differences observed in this case, since we’re only manipulating a 1-element vector. Larger performance differences may (or may not, I’m not sure) appear when manipulating large vectors.

7 Likes

There is Setfield.jl which has a simpler syntax, apparently: @set! s.x = 1

Irrespective of the wonderful answer by @ffevotte regarding allocations above, there will probably still be some lingering performance gap between an immutable struct and a mutable one in the extreme cases. The reason for this is that mutable structs are guaranteed to be distinct objects when you create a new one, that must be able to be safely referenced without the array they are possibly in. For example:

julia> mutable struct MyStruct
         f::Int
       end

julia> ms = MyStruct(1)
MyStruct(1)

julia> ms_Arr = [ ms ]
1-element Vector{MyStruct}:
 MyStruct(1)

julia> ms_Arr[1] === ms
true

This may seem obvious, but has some implications:

julia> function foo(m::MyStruct, n::Int)
           s = 0
           for _ in 1:n
              s += m.f
              m.f = s
           end
       end
               
foo (generic function with 1 method)

julia> foo(ms, 10)

julia> ms_Arr[1]
MyStruct(512)

This always has to work, and because of this structs that are mutable (and some others) can’t be stored inline in an array. The only time it would be possible to safely store instances of MyStruct inline in the array is if both the array and its elements have the same lifetime, i.e. they have no way of existing without each other. At the moment, julia does not have such an optimization, which in general is pretty hard to do (Rust can do it, due to the semantics they’ve chosen and their borrow checker).

Due to this not being stored inline and internally being stored as a reference (invisible to the user, so this is allowed to change in some cases in the future), you’re always going to pay that dereferencing cost.

3 Likes

Yes, Accessors.jl is actually the successor of SetField.jl (but the latter is still maintained).

(I’m not a very proficient user of SetField / Accessors, so take the following with a grain of salt)

Accessors.jl also provides the nice syntax @set s.x = 1, but I don’t think it works when s is replaced by something like a[i].

In other words, I think @set a[i].s = 1 is interpreted as : “give me a copy of a in which the x field of the i-th element is set to 1”. As opposed to “give me a copy of a[i] in which the x field is set to 1” (which would be the desired semantics here).

2 Likes

Thanks for all of the explanations and suggestions.

For my actual use-case (a Dict of structs with 5 primitive fields), where I’m adding and removing new structs billions of times, but rarely modifying their fields, the suggestion of using immutable structs and just creating new immutables whenever I need to modify makes sense and the overhead of writing 5 fields rather than just the 1 I actually want to change will be minimal.

I would still be nice to have the low-level capabilities of Rust, but I guess there are usually reasonable workarounds.

2 Likes

If it helps, think about what it takes for a type to be mutable. It’s not just to change values; reassignments like x += 1 and v[1] = T(i) accomplish something similar. It’s that references can share a true mutation, e.g. x = y; y.a = 1 the 1 change is visible to both x and y. Reassignment cannot do that, e.g. x = y; y += 1.

To make mutation efficient, a mutable instance must only have 1 copy in 1 place; if the data were copied to many places (like an immutable instance), then every copy would need to be linked to each other and each be mutated the exact same way. That 1 place doesn’t actually have to be on the heap; if the instance is known to be limited to a particular scope and has a fixed size, it can go on the stack.

For memory management to be efficient, a mutable instance should be allocated independently. Say you return only a few elements of an array e.g. x = A[2]; y = x; return x, A[15], y. You either 1) are forced to keep A’s array allocated because that’s where the elements live, 2) reallocate independent copies of the elements and somehow update the addresses of every live reference (x, y) before garbage collecting A’s array, or 3) allocate the elements independently from the beginning. Optimizations are hypothetically possible, but they require proof that the code doesn’t do something like this example.

Mutable types aren’t the default struct definition for a good reason; their relative inefficiency is a necessary cost to allow many references to share a mutation. If you don’t need that feature, reassigning immutable instances is good enough for “changing data.” My personal question is, is the compiler capable of just modifying the data for a particular field instead of reconstructing a whole instance, like the semantics of SetField/Accessors?

2 Likes

The @set syntax is also available in Accessors, of course. As @ffevotte says, it’s the “official” Setfield successor. The exact replacement for @set! in Setfield is @reset in Accessors.

I had a proposal once to make it possible to specify the actual target explicitly in @set, but the syntax is not totally intuitive. The basic @set a[i].x = 1 returns the copy of a with x field of ith element set to 1 - what else could it mean? @set $(a[i]).x = 1 could mean “return copy of a[i] with x = 1” with support explicit specification of the optic target by aplavin · Pull Request #55 · JuliaObjects/Accessors.jl · GitHub…

2 Likes