Filtering keys out of named tuples

What’s the right way to filter a key out of a named tuple? Example:

nt = (;a = 1, b = 2)
filter(x->x[1] == :a,nt) # Errors
filter(x->x[1] == :a,Base.Pairs(nt,keys(nt))) # Allocates an entire dictionary

What’s the right way to do this and get another NamedTuple back out instead of going through a dictionary and losing all inference/performance? Does this have an issue in Base already?

1 Like

I do believe the following is faster, can you benchmark it for your use case and give some feedback?

julia> filter_nt_fields(f, nt) = NamedTuple{filter(f, keys(nt))}(nt)
filter_nt_fields (generic function with 1 method)

julia> nt = (;a = 1, b = 2, c = 3)
(a = 1, b = 2, c = 3)

julia> filter_nt_fields(!=(:b), nt)
(a = 1, c = 3)

Note, however, that this is an inherently type-unstable operation.

Not with constant prop. You should be able with constant prop to know takeout(nt,(:b,)) and know that it will be (:a,:c) left. We just don’t have a method for it :sweat_smile:. So yeah, maybe filter isn’t the right solution but that means there’s a missing verb.

1 Like

The keys method for NamedTuple return a Tuple and the filter method for Tuple acts directly over the Tuple (do not collect in a Vector) and return a Tuple, so theoretically, there the potential for constant propagation all the way through, did you try to see if it happens in your use case?

You can do tail((; b=nothing, nt...,)), which seems fast:

julia> nt = (a=1, b=2, c=3);

julia> @inline takeout(kill::Symbol, nt::NamedTuple) = Base.tail(merge(NamedTuple{(kill,)}((nothing,)), nt));

julia> @btime takeout(:b, $nt)
  min 0.875 ns, mean 0.946 ns (0 allocations)
(a = 1, c = 3)

(This will break if nt doesn’t have :b, of course.)

1 Like

Does this help, or you need more general filtering with an arbitrary predicate?

julia> nt = (;a = 1, b = 2)
(a = 1, b = 2)

julia> nt[(:a,)]
(a = 1,)

julia> Base.structdiff(nt, NamedTuple{(:a,)})
(b = 2,)

julia> using Accessors
julia> @delete nt.a
(b = 2,)

julia> using InvertedIndices
julia> nt[Not(:a)]  # if/when https://github.com/JuliaData/InvertedIndices.jl/pull/29 lands
(b = 2,)
julia> @btime nt_b=NamedTuple{filter(!=(:b), keys($nt))}($nt)
  0.001 ns (0 allocations: 0 bytes)
(a = 1, c = 3)

julia> @btime $nt[filter(!=(:b), keys($nt))]
  0.001 ns (0 allocations: 0 bytes)
(a = 1, c = 3)

Much of the benchmarking above presumes you know ahead of time the one symbol that want to remove, and you always want to remove that same symbol. The timings and allocations increase when both the named tuple and the symbol are not fixed. You may want to remove more than one key. Base uses Base.structdiff for this kind of manipulation. One approach:

drop(nt::NamedTuple, key::Symbol) = 
    Base.structdiff(nt, NamedTuple{(key,)})
drop(nt:: NamedTuple, keys::NTuple{N,Symbol}) where {N} =
    Base.structdiff(nt, NamedTuple{keys})

When you do know which symbol[s] are to be dropped from the keys, it is much more performant to work with function[s] that use that information immediately. When the same field[s] is/are removed in different parts of the source code, you may prefer to specialize the names.

drop_baz(nt) = Base.structdiff(nt, NamedTuple{(:baz,)})
drop_b_c(nt) = Base.structdiff(nt, NamedTuple{(:b, :c)})

Macroizing that process makes sense.

2 Likes

Performance of these variants is exactly the same under constant propagation, so there is no need to define functions separately for each key name:

julia> nt = (a=nothing, b=2, c='a')

julia> @btime Base.structdiff($nt, NamedTuple{(:a,)})
  1.589 ns (0 allocations: 0 bytes)
(b = 2, c = 'a')

# your drop() function, with @inline:
julia> @inline drop(nt::NamedTuple, key::Symbol) = 
           Base.structdiff(nt, NamedTuple{(key,)})
julia> @btime drop($nt, :a)
  1.587 ns (0 allocations: 0 bytes)
(b = 2, c = 'a')

# implementation in an existing package:
julia> using Accessors
julia> @btime @delete $nt.a
  1.586 ns (0 allocations: 0 bytes)
(b = 2, c = 'a')

1 Like

Yes. I was aware of the timing information; imo it is better practice to wrap uses of nonexported Base functions inside a user function, which may be inlined.

The multiple naming was to make clear the different specific intents, not to imply that need be done. I have modified the comment to clarify.