Alternative to mutable named tuple?

For many of my ad-hoc projects I’ve started using Vectors of NamedTuples rather than Vectors of structs because I get the same performance with the benefit of being able to slightly nicer syntax and being able to change their definition without having to reload the entire Julia session.

In some cases I need to update a single field many times, so I’d prefer to use a mutable object for cleaner syntax and to save on memory bandwidth. The obvious solution is a mutable struct, but that has the disadvantages mentioned before.

Is there a fundamental reason why NamedTuples can’t be made mutable? Is it possible/reasonable to write a package with this functionality? Or does one already exist somewhere?

1 Like

Have you looked at https://github.com/JuliaArrays/StructArrays.jl?

4 Likes

Setfield.jl probably your use-case quite nicely at the expense of an extra @set before the assignment:

EDIT: This use-case is certainly one that I feel like Setfield.jl should solve, but at present it does not. It doesn’t seem to include a good way of dealing with a datastructure which is partially mutable and partially immutable. I’ve opened an issue about doing so:

julia> using Setfield

julia> x = [(a=rand(), b=rand()) for _=1:5]
5-element Array{NamedTuple{(:a, :b),Tuple{Float64,Float64}},1}:
 (a = 0.8832109672741486, b = 0.5778852629704625)
 (a = 0.8694209006704752, b = 0.2851362925386456)
 (a = 0.6117367347350626, b = 0.5262148615818458)
 (a = 0.483913668604423, b = 0.5401690020155581)
 (a = 0.016761730101063632, b = 0.7422138498413362)

julia> @set x[2].a = 5.0
5-element Array{NamedTuple{(:a, :b),Tuple{Float64,Float64}},1}:
 (a = 0.8832109672741486, b = 0.5778852629704625)
 (a = 5.0, b = 0.2851362925386456)
 (a = 0.6117367347350626, b = 0.5262148615818458)
 (a = 0.483913668604423, b = 0.5401690020155581)
 (a = 0.016761730101063632, b = 0.7422138498413362)

julia> x
5-element Array{NamedTuple{(:a, :b),Tuple{Float64,Float64}},1}:
 (a = 0.8832109672741486, b = 0.5778852629704625)
 (a = 0.8694209006704752, b = 0.2851362925386456)
 (a = 0.6117367347350626, b = 0.5262148615818458)
 (a = 0.483913668604423, b = 0.5401690020155581)
 (a = 0.016761730101063632, b = 0.7422138498413362)

You’ll almost certainly get better performance doing this than if you cooked up some sort of mutable named tuple.

1 Like

One of the HUGE advantages of an immutable type is precisely their memory bandwidth! They can be stored directly in an array as they don’t have behavior-over-time. In other words, when you have an array like this, you can see its values are just stored directly and sequentially:

julia> x = [(a=1, b=2), (a=3, b=4)]
2-element Array{NamedTuple{(:a, :b),Tuple{Int64,Int64}},1}:
 (a = 1, b = 2)
 (a = 3, b = 4)

julia> reinterpret(Int, x)
4-element reinterpret(Int64, ::Array{NamedTuple{(:a, :b),Tuple{Int64,Int64}},1}):
 1
 2
 3
 4

If you have a mutable type, that means that you can observe changes from elsewhere — and the vector can’t be its one “canonical” home. Thus, the vector now just holds onto pointers to independently allocated mutable structs that are no longer contiguous in memory.

The ability to update immutables inside containers like this would be a really nice language feature: WIP: Make mutating immutables easier by Keno · Pull Request #21912 · JuliaLang/julia · GitHub

5 Likes
julia> Setfield.setindex.(x, 2, :a)
5-element Array{NamedTuple{(:a, :b),Tuple{Int64,Float64}},1}:
 (a = 2, b = 0.9186393557427228)
 (a = 2, b = 0.7357657673876994)
 (a = 2, b = 0.5211924226841096)
 (a = 2, b = 0.8451059842404511)
 (a = 2, b = 0.29146593124174935)
1 Like

Yeah, but that returns a completely new array.

Not sure this is useful to anyone, but I had fun writing it: MutableNamedTuples.jl. This is a named tuple which tries to act like a regular named tuple, but secretly stores each of it’s fields as Refs to make them mutable.

julia> using MutableNamedTuples

julia> mnt = MutableNamedTuple(a=1, b= 2)
MutableNamedTuple(a = 1, b = 2)

julia> mnt.a = 2
2

julia> mnt
MutableNamedTuple(a = 2, b = 2)

and in arrays

julia> const MNT = MutableNamedTuple
MutableNamedTuple

julia> A = [MNT(a = 1, b=2) MNT(a=3, b=6)
            MNT(a =-1, b=4) MNT(a=1, b=1)]
2×2 Array{MutableNamedTuple{(:a, :b),Tuple{Base.RefValue{Int64},Base.RefValue{Int64}}},2}:
 MutableNamedTuple(a = 1, b = 2)   MutableNamedTuple(a = 3, b = 6)
 MutableNamedTuple(a = -1, b = 4)  MutableNamedTuple(a = 1, b = 1)

julia> A[1, 1].a = 100
100

julia> A
2×2 Array{MutableNamedTuple{(:a, :b),Tuple{Base.RefValue{Int64},Base.RefValue{Int64}}},2}:
 MutableNamedTuple(a = 100, b = 2)  MutableNamedTuple(a = 3, b = 6)
 MutableNamedTuple(a = -1, b = 4)   MutableNamedTuple(a = 1, b = 1)

This is exactly the sort of thing that StructArrays.jl is great for though, and it probably does it better.

4 Likes

I’m going to just follow up on my earlier post with an example:

using StructArrays
julia> x = [(a=1, b=2), (a=3, b=4)]
2-element Array{NamedTuple{(:a, :b),Tuple{Int64,Int64}},1}:
 (a = 1, b = 2)
 (a = 3, b = 4)

julia> sx = StructArray(x)
2-element StructArray(::Array{Int64,1}, ::Array{Int64,1}) with eltype NamedTuple{(:a, :b),Tuple{Int64,Int64}}:
 (a = 1, b = 2)
 (a = 3, b = 4)

julia> sx.b .= 1
2-element Array{Int64,1}:
 1
 1

julia> sx
2-element StructArray(::Array{Int64,1}, ::Array{Int64,1}) with eltype NamedTuple{(:a, :b),Tuple{Int64,Int64}}:
 (a = 1, b = 1)
 (a = 3, b = 1)

It doesn’t get easier than that. As far as memory usage, I think you probably shouldn’t worry about it unless these are huge vectors, in which case you probably want to do something different entirely.

6 Likes

As you’ve already found out, Setfield.jl has enough mechanisms to allow this but there is no high-level easy-to-use interface yet. FYI, BangBang wraps the low-level interface and lets you mutate (or widen) array of named tuples:

julia> using BangBang

julia> xs = [(a=1, b=2), (a=3, b=4)]
2-element Array{NamedTuple{(:a, :b),Tuple{Int64,Int64}},1}:
 (a = 1, b = 2)
 (a = 3, b = 4)

julia> @set!! xs[1].b = 200;

julia> xs
2-element Array{NamedTuple{(:a, :b),Tuple{Int64,Int64}},1}:
 (a = 1, b = 200)
 (a = 3, b = 4)

See: https://tkf.github.io/BangBang.jl/dev/#BangBang.SetfieldImpl.@set!!-Tuple{Any}

If this is the motivation, I think StructArrays.jl is the best choice as tbeason mentioned.

4 Likes

Thanks for all of the suggestions! A StructArray of NamedTuples is probably the best compromise. The only slight disadvantage is that it would be better to keep all of the fields of a single tuple in a contiguous block of memory. The arrays will be up to 10^9 in size, so trying to make things as efficient as possible.

But why couldn’t the mutable arrays be stored in the same way if the elements are of a known fixed size at JIT compile time? Is it a limitation of the GC? If I had a pointer to the first element of an array of NamedTuples with 2 Float64 values, what’s to stop me from independently updating the values of the ith tuple at the memory addresses of ptr+16*i and ptr+16*i+8?

If you want array-of-structs:

julia> xs = [(a=1, b=2.0), (a=3, b=4.0)]
2-element Array{NamedTuple{(:a, :b),Tuple{Int64,Float64}},1}:
 (a = 1, b = 2.0)
 (a = 3, b = 4.0)

julia> unsafe_store!(Ptr{Float64}(pointer(xs, 2) + fieldoffset(eltype(xs), 2)), 4000.0)
Ptr{Float64} @0x00007f4b332dd9c8

julia> xs
2-element Array{NamedTuple{(:a, :b),Tuple{Int64,Float64}},1}:
 (a = 1, b = 2.0)
 (a = 3, b = 4000.0)

Wrapping this in a safer API would be nice. See also:

2 Likes

I think this is what MemoryMutate.jl does (wrapped up and safer), but I haven’t used it myself, so I can’t vouch for it (in fact, the only thing I know about it is that at some point in the past, I saved a bookmark to it).

2 Likes

Continuous array indexing doesn’t look like it’s implemented yet.
Blobs.jl looks similar and better maintained, but I think requires you to get the element pointer yourself.

Edit: I do love how as an example MemoryMutate basically provided an alternative way to implement MArray. Would be interesting to see how the approaches compare in compiler-friendliness.

2 Likes

I think it’s fairly easy to come up with composable Ref based on the idea @vchuravy mentioned:

using Base: fieldindex, unsafe_convert

struct RefField2{f,T,R <: Ref} <: Ref{T}
    x::R
end
function RefField2(r::Ref{T}, f::Int) where {T}
    @assert isbitstype(fieldtype(T, f))
    return RefField2{f, fieldtype(T, f), typeof(r)}(r)
end
RefField2(r::Ref, name::Symbol) = RefField2(r, fieldindex(_reftype(r), name))

_inner(r::RefField2) = getfield(r, :x)
_reftype(::Ref{T}) where {T} = T

function Base.unsafe_convert(::Type{Ptr{T}}, r::RefField2{f, T}) where {f, T}
    iref = _inner(r)
    p::Ptr{T} = unsafe_convert(Ptr{_reftype(iref)}, iref)
    return p + fieldoffset(_reftype(iref), f)
end

Base.pointer(r::RefField2) = unsafe_convert(Ptr{_reftype(r)}, r)

Base.getproperty(r::RefField2, name::Symbol) = RefField2(r, name)
Base.getindex(r::RefField2) = unsafe_load(pointer(r))
Base.setindex!(r::RefField2, x) = unsafe_store!(pointer(r), convert(_reftype(r), x))

struct RefArray2{T,A<:Array{T}} <: Ref{T}
    x::A
    i::Int
    @inline function RefArray2(A::Array, i::Int)
        @boundscheck checkbounds(A, i)
        return new{eltype(A),typeof(A)}(A, i)
    end
end

RefArray2(A::Array, I) = RefArray2(A, Base.to_index(A, I))

_array(r) = getfield(r, :x)
_index(r) = getfield(r, :i)

Base.unsafe_convert(::Type{Ptr{T}}, r::RefArray2{T}) where {T} =
    pointer(_array(r), _index(r))

Base.pointer(r::RefArray2) = unsafe_convert(Ptr{_reftype(r)}, r)

Base.getproperty(r::RefArray2, name::Symbol) = RefField2(r, name)
Base.getindex(r::RefArray2) = unsafe_load(pointer(r))

Then

julia> xs = [(a=(b=(c=x, d=2x), e=3x), f=4x) for x in 1:2]
2-element Array{NamedTuple{(:a, :f),Tuple{NamedTuple{(:b, :e),Tuple{NamedTuple{(:c, :d),Tuple{Int64,Int64}},Int64}},Int64}},1}:
 (a = (b = (c = 1, d = 2), e = 3), f = 4)
 (a = (b = (c = 2, d = 4), e = 6), f = 8)

julia> r = RefArray2(xs, 2).a.b.c;  # reference to `xs[2].a.b.c`

julia> r[]
2

julia> r[] = 2000;

julia> xs
2-element Array{NamedTuple{(:a, :f),Tuple{NamedTuple{(:b, :e),Tuple{NamedTuple{(:c, :d),Tuple{Int64,Int64}},Int64}},Int64}},1}:
 (a = (b = (c = 1, d = 2), e = 3), f = 4)
 (a = (b = (c = 2000, d = 4), e = 6), f = 8)

Though it’s not as efficient as I hoped. For example, looking at the code generated for r[], not all unsafe_convert calls are inlined.

Is there a flavor of this without the naming option? Namely just Mutable Tuples with the same trick you used in your package?