Macro for merging namedtuples and add suffixes in a type stable way

I’m trying to create a macro that will merge named tuples and change the key names by adding suffixes, in a way that’s type stable.

Suppose that it’s guaranteed that nt1 and nt2 are the same type of named tuples, i.e. they have the same key names and same value types. For example, nt1, and nt2 could be:

nt1 = (; a=1, b=2)
nt2 = (; a=3, b=4)

I would like a function mymerge(xx::Vararg[NT,N}) where {N,NT<:NamedTuple} such that

mymerge(nt1, nt2) == (; a_1=1, b_1=2, a_2=3, b_2=4)

Suppose I provide a function that adds suffixes, say

_rename_suffix(x::Union{AbstractString,Symbol}, suffix) = Symbol(x, "_", suffix)

The naive way of doing this (by using merge and building NamedTuples with suffixed keys inside a tuple comprehension) doesn’t work for me, because it’s not type stable.

So I’m trying to achieve the same behavior by using a @generated function.

Here’s what I have so far:

@generated function mymerge(xx::Vararg{NT,N}) where{N,names_,types,NT<:NamedTuple{names_,types}}
    TT = NamedTuple{
        tuple(collect(Iterators.flatten(_rename_suffix.(names_, n) for n in 1:N))...),
        Tuple{Iterators.flatten(fieldtypes(types) for _ in 1:N)...}
    }
    t = Iterators.flatten([[Expr(:call, :getfield, x, Meta.quot(name)) for name in names_] for x in xx])  # this line is wrong
    Expr(:call, TT, t)
end

I believe that the type TT is correct. However, t is not. I get the error

nt1 = (; a=1, b=2)
nt2 = (; a=3, b=4)
mymerge(nt1, nt2)
MethodError: Cannot `convert` an object of type Expr to an object of type Int64

If this wasn’t a generated function, the way to do this could be

TT(tuple(Iterators.flatten((values(x) for x in xx)))...)

but of course that doesn’t work as @generated functions can’t access the value of the arguments.

1 Like

In DataManipulation.jl, we have various advanced indexing for NamedTuples that is zero-cost (ie, compile time):

julia> nt1 = (; a=1, b=2)
julia> nt2 = (; a=3, b=4)

julia> using DataManipulation

# regex-based rename
# sr"" is a static regex, like r"" but compiletime
# ss"" is a static substitution string, like s"" but compiletime
julia> nt1[sr".*" => ss"\0_1"]
(a_1 = 1, b_1 = 2)

# add _1 to nt1 keys, _2 to nt2 keys:
julia> ntm = merge(nt1[sr".*" => ss"\0_1"], nt2[sr".*" => ss"\0_2"])
(a_1 = 1, b_1 = 2, a_2 = 3, b_2 = 4)

It also has the inverse operation, nest:

julia> nt1, nt2 = nest(ntm, sr"(.*)_(\d+)" => (ss"\2", ss"\1"))
(var"1" = (a = 1, b = 2), var"2" = (a = 3, b = 4))

julia> nt1
(a = 1, b = 2)

I’ve been planning to add unnest that would simplify the first example above, simply didn’t have a real usecase yet :slight_smile:

7 Likes

I tried to follow the examples of using the package DataManipulation functions, given here.

I have not read other manuals related to the other packages used, but I understood (at least I think) all the examples except this one

data_2_flat = @p data_2 |> flatmap(_.t, (;_..., t=_2))

I tried to simulate the task of the flatmap function this way

[(;t=e,NamedTuple{filter(e->e !=:t, propertynames(d))}(d)...) for d in data_2 for e in d.t ]

# or better

[(;d...,t=e) for d in data_2 for e in d.t ]

, but I’m curious about how the expression syntax works.
A step-by-step explanation would be appreciated.

There are two orthogonal parts to understand here.

  • First is the piping syntax that comes from DataPipes.jl, the @p macro. As documented there, this call expands to
flatmap(x -> x.t, (x, y) -> (;x..., t=y), data_2)
  • Then, what flatmap does. As documented, that’s the flatmap behavior with two functions:

flatmap(fₒᵤₜ, fᵢₙ, X): apply fₒᵤₜ to all elements of X, and apply fᵢₙ to the results. Basically, [fᵢₙ(x, y) for x in X for y in fₒᵤₜ(x)].

flatmap() is a more type-preserving, type-stable, and performant version of this common operation.

And DataManipulation.jl is intended as an everything-included package for a wide variety of common data manipulation tasks in Julia. It both reexports smaller packages (like DataPipes.jl and FlexiMaps.jl, see the full list in the docs), and defines more functions itself – those that I didn’t see a more natural place to put into.
Let me know if something is still not clear here on in the docs!

(sorry to the OP, this is completely unrelated to the question of this thread :slight_smile: )

2 Likes