I hope you are well! I have a question regarding preallocating Vectors of NamedTuples. I have a custom struct that contains a field that is a NamedTuple.
mutable struct MyCustomStruct{T<:NamedTuple}
param::T
end
mystruct = MyCustomStruct( (a=1, b=2.0, c = [3]))
Now in a specific function, I have several ‘MyCustomStruct’ structs in the form of a vector of ‘MyCustomStruct’, and I need to resample the param fields of these structs according to some index. Ideally, I can create a pre-allocation container so I can resample without allocations:
mystruct_vec = [deepcopy(mystruct) for _ in 1:3]
buffer = [deepcopy(mystruct.param) for _ in 1:3]
shuffled_indices = [1,1,1] # Replace param fields of all mystruct_vec structs with param field from first mystruct_vec struct
function resample_param(structᵥ::Vector{<:MyCustomStruct}, buffer::Vector{<:NamedTuple}, index::Vector{Int64})
#First shuffle the 'structᵥ' fields into the correct order from 'index', and temporarily store them in buffer.
for iter in eachindex(structᵥ)
buffer[iter] = structᵥ[index[iter]].param
end
#Then use correct buffer order to fill 'structᵥ' fields.
for iter in eachindex(structᵥ)
structᵥ[iter].param = buffer[iter]
end
return nothing
end
resample_param(mystruct_vec, buffer, shuffled_indices)
using BenchmarkTools
@btime resample_param($mystruct_vec, $buffer, $shuffled_indices) #14.228 ns (0 allocations: 0 bytes)
Great! This has 0 allocations, but unfortunately I have some pointer issues here:
mystruct_vec[1].param.c #[3]
mystruct_vec[2].param.c #[3]
mystruct_vec[2].param.c[1] = 123
mystruct_vec[1].param.c #[123] #This should still be [3]!
mystruct_vec[2].param.c #[123]
I can remove the pointer issue by just deepcopying the fields in the function, but that just invalidates the preallocation and has lots of allocations. Is there a way to make this performant without pointer issues?
As you probably understand, the issue here is that you’re setting every entry of buffer to the same single entry of mystruct_vec, so they all share the same array for field c. Thus, mutating one affects all of them. You can copy every entry of a NamedTuple via map(copy,nt) where nt::NamedTuple. This will solve your problem but this will allocate new arrays.
If you c array is always the same short length, a SVector (which is immutable) from StaticArrays would be great here and would solve your issue without the need for further changes.
But if you really want a mutable object for a field in your NamedTuple, you’ll need to either modify the existing array from the buffer or have to reallocate a new array each time.
For example, updateparam!(x::NamedTuple,y::NamedTuple) = (;a=y.a,b=y.b,c=copy!(x.c,y.c))
will recycle x.c by overwriting it with the contents of y.c, saving the allocation. If the sizes are possibly different, you may need to copy!(resize!(x.c,length(y.c)),y.c) instead.
Is there a way to make the dot syntax work for a NamedTuple such that I can use the preallocated buffers? Currently there does not seem to be a method implemented for that case, but I struggle to think about an appropriate method to write myself:
function resample_param2(structᵥ::Vector{<:MyCustomStruct}, buffer::Vector{<:NamedTuple}, index::Vector{Int64})
#First shuffle the 'structᵥ' fields into the correct order from 'index', and temporarily store them in buffer.
for iter in eachindex(structᵥ)
buffer[iter] .= structᵥ[index[iter]].param #CHANGE HERE
end
#Then use correct buffer order to fill 'structᵥ' fields.
for iter in eachindex(structᵥ)
structᵥ[iter].param = buffer[iter]
end
return nothing
end
mystruct_vec = [deepcopy(mystruct) for _ in 1:3]
buffer = [deepcopy(mystruct.param) for _ in 1:3]
shuffled_indices = [1,1,1] #
resample_param2(mystruct_vec, buffer, shuffled_indices) #ArgumentError: broadcasting over dictionaries and `NamedTuple`s is reserved