Preallocating Vectors of NamedTuples

Dear all,

I hope you are well! I have a question regarding preallocating Vectors of NamedTuples. I have a custom struct that contains a field that is a NamedTuple.

mutable struct MyCustomStruct{T<:NamedTuple}
    param::T
end
mystruct = MyCustomStruct( (a=1, b=2.0, c = [3]))

Now in a specific function, I have several ‘MyCustomStruct’ structs in the form of a vector of ‘MyCustomStruct’, and I need to resample the param fields of these structs according to some index. Ideally, I can create a pre-allocation container so I can resample without allocations:

mystruct_vec = [deepcopy(mystruct) for _ in 1:3]
buffer = [deepcopy(mystruct.param) for _ in 1:3]
shuffled_indices = [1,1,1] # Replace param fields of all mystruct_vec structs with param field from first mystruct_vec struct

function resample_param(structᵥ::Vector{<:MyCustomStruct}, buffer::Vector{<:NamedTuple}, index::Vector{Int64})
    #First shuffle the 'structᵥ' fields into the correct order from 'index', and temporarily store them in buffer.
    for iter in eachindex(structᵥ)
        buffer[iter] = structᵥ[index[iter]].param
    end
    #Then use correct buffer order to fill 'structᵥ' fields.
    for iter in eachindex(structᵥ)
        structᵥ[iter].param = buffer[iter]
    end
    return nothing
end
resample_param(mystruct_vec, buffer, shuffled_indices)
using BenchmarkTools
@btime resample_param($mystruct_vec, $buffer, $shuffled_indices) #14.228 ns (0 allocations: 0 bytes)

Great! This has 0 allocations, but unfortunately I have some pointer issues here:

mystruct_vec[1].param.c #[3]
mystruct_vec[2].param.c #[3]

mystruct_vec[2].param.c[1] = 123

mystruct_vec[1].param.c #[123] #This should still be [3]!
mystruct_vec[2].param.c #[123]

I can remove the pointer issue by just deepcopying the fields in the function, but that just invalidates the preallocation and has lots of allocations. Is there a way to make this performant without pointer issues?

As you probably understand, the issue here is that you’re setting every entry of buffer to the same single entry of mystruct_vec, so they all share the same array for field c. Thus, mutating one affects all of them. You can copy every entry of a NamedTuple via map(copy,nt) where nt::NamedTuple. This will solve your problem but this will allocate new arrays.

If you c array is always the same short length, a SVector (which is immutable) from StaticArrays would be great here and would solve your issue without the need for further changes.

But if you really want a mutable object for a field in your NamedTuple, you’ll need to either modify the existing array from the buffer or have to reallocate a new array each time.

For example,
updateparam!(x::NamedTuple,y::NamedTuple) = (;a=y.a,b=y.b,c=copy!(x.c,y.c))
will recycle x.c by overwriting it with the contents of y.c, saving the allocation. If the sizes are possibly different, you may need to copy!(resize!(x.c,length(y.c)),y.c) instead.

Thank you for your answer!

Is there a way to make the dot syntax work for a NamedTuple such that I can use the preallocated buffers? Currently there does not seem to be a method implemented for that case, but I struggle to think about an appropriate method to write myself:


function resample_param2(structᵥ::Vector{<:MyCustomStruct}, buffer::Vector{<:NamedTuple}, index::Vector{Int64})
    #First shuffle the 'structᵥ' fields into the correct order from 'index', and temporarily store them in buffer.
    for iter in eachindex(structᵥ)
        buffer[iter] .= structᵥ[index[iter]].param #CHANGE HERE
    end
    #Then use correct buffer order to fill 'structᵥ' fields.
    for iter in eachindex(structᵥ)
        structᵥ[iter].param = buffer[iter]
    end
    return nothing
end

mystruct_vec = [deepcopy(mystruct) for _ in 1:3]
buffer = [deepcopy(mystruct.param) for _ in 1:3]
shuffled_indices = [1,1,1] #

resample_param2(mystruct_vec, buffer, shuffled_indices) #ArgumentError: broadcasting over dictionaries and `NamedTuple`s is reserved

Not really, no. A NamedTuple is not mutable so you can’t .= into one. Hence the need for a specialized function to manually handle each field.