Subset a NamedTuple?

Is there a function to subset a NamedTuple? Like

nt = (a = 1, b = "hello", c = 3.14)
s = Base.subset(nt, :a, :c)

The above code, which doesn’t work, was generate by ChatGPT. It also generated another solution, which does work:

nt = (a = 1, b = "hello", c = 3.14)
keys = (:a, :c)
s = NamedTuple{keys}(nt)

So, was a function like select once provided? I like the former approach only because its intention is a bit clearer. Of course, I can live with the latter, or I can write my own select function . . .

julia> nt = (a = 1, b = "hello", c = 3.14)
(a = 1, b = "hello", c = 3.14)

julia> nt[(:a, :c)]
(a = 1, c = 3.14)

I resorted to the documentation instead of ChatGPT:

Accessing the value associated with a name in a named tuple can be done using
field access syntax, e.g. x.a, or using getindex, e.g. x[:a] or x[(:a, :b)].

11 Likes

Ah, the wonderful world of AI hallucinations. Just curious: what’s your use case?

1 Like

Ah! Each time I ask a question here, I feel stupid. This had rarely happened before starting to use Julia. There is something about Julia that stumps Google and Bing.

Not in the present case, but it’s often hard to find the relevant information in the official documents. (In the present case, your comment made me locate the feature in the official documentation of NamedTuple. So, my bad, in the present case.)

So, I gave up. First Google and Bing. If it fails, ChatGPT. If it fails, I ask questions here. I should really search the official documents . . .

vars = read_data() # returns a large NamedTuple
subset = vars[(:a, :x, :q, :ii)] # Needs only a subset
func_with_optional_keyword_args(; subset...)

I don’t think this can be any simpler and clearer than this.

It is admittedly hard to find information in the official docs! But it is also the most reliable source. I don’t use the web version very often, but I frequently look into the REPL’s help (in your case, ?NamedTuple).

1 Like

I see. If func_with_optional_keyword_args is yours, you could also adapt the signature to func_with_optional_keyword_args(; a = :foo, x = :bar, q = :baz, ii = :qux, kwargs...) and keep the whole named tuple.

2 Likes

That was my first thought: Accept the whole NamedTuple on the receiving function, but then my second thought was, why make it more complex than necessary? This was the starting point of this search of mine, in the first place.

Even though the data producing function read_data() and the data processing function func_with_optional_keyword_args() are in the same problem domain and share the names and meanings of the variables, I want to minimize the interface to clarify the dependency. I want to say, “this function requires this and that variables, not more.”

Otherwise, all functions end up appearing to depend on all variables out there:

(v1, v2, v3, . . . . ) = read_data()

function dothis()
  # use only v3, and v5
end
function dothat()
  # use only v1, and v12
end

This style “works” at first, but when you want to lift one of the functions you wrote some time ago for another program, it’s not immediately clear which variables the function depends on. So, it’s always desirable to minimize the function interface.

1 Like

I agree with your idea of minimizing the function interface. However, I am wondering: why is the first option preferred over the second in the following?

# Option 1
vars = read_data()
subset = vars[(:a, :x, :q, :ii)]
func_with_optional_keyword_args(; subset...)

# Option 2
vars = read_data()
func_with_optional_keyword_args(; vars.a, vars.x, vars.q, vars.ii)

You have to enumerate the arguments anyway, and the second option appears more explicit and transparent to me (it is not immediately apparent which arguments are set when calling f(; subset...)). Is it about brevity?

Probably mixed it up with code using DataFrames.subset, possibly also confused by code with R’s subset. Just a downside of a model trained on so much, it’s not very good at context, though prompts do help.

1 Like

I tried asking this on JuliaHub’s AskAI just to see if that’s a better option.

It recommended select from NamedTupleTools.jl, which at least seems to be real.

It did not mention the simple getindex option from the solution here though. I tried asking “Isn’t there a simpler way without external packages?” to see if it would arrive there. That somehow made it lose track of the context entirely, and talk in generic terms about using external packages.

So the answer by itself is somewhat better, but the UI is pretty atrocious (no history, no multiple chat threads, no editing even), and it seems to have a hard time keeping to the thread of the conversation, so I can’t really recommend this as a better chat interface. It’s an alternate to keep in mind and worth giving a try from time to time though.

1 Like

That, and avoiding additional clutter. Reducing clutter leads to increased readability. My real code looks like this:

ful = get_data()
pij = ful.pij
ginv = 1/ful.gravit
sub = ful[(:dx, :dz, :bvf2e, :cee, :iocn, :kocn, :f0)]
args = (; sub..., ginv=ginv)
accuracy(view(pij,:,:,1); args...)

Before arriving at the present solution, my code was like

vars = get_data()
ginv = 1/vars.gravit
accuracy(view(vars.p2d, :,:,1) ; dx=vars.dx, dz=vars.dz, bvf2e=vars.bvf2e, c=vars.c
   ,f0=vars.f0, ginv=ginv, iocn=vars.iocn, kocn=vars.kocn)

See? The clutter of vars. and the repetition of the x = vars.x pattern reduce clarity in the latter case.

You exaggerate. The variables are listed one line or two before the function call:

args = vars[ (:a, :c, :e, :f, . . .)]
func(; args...) # see the previous line! It's not far away.

vs

func(; a = vars.a, b = vars.b, c = vars.c, e = vars.e, f = vars.f)

Do you really think that the second is clearer than the first?

If you really prefers the second style, what about

func(; vars[(:a, :b, :c, :e, :f)]...)

? Explicit and brief. For my real code, I don’t take the 3rd approach because I need to calculate one argument (ginv = 1/vars.gravit) and separate out one positional argument (pij = vars.pij), but if I just pass a subset of the variables, I would use the 3rd style, or even

func(; read_data()[(:a, :b, :c, :e, :f)]...)

if I don’t need the full result from read_data() further.

I don’t want to steal your topic — especially when you already got an answer to your question. I just want to note that it is not required to repeat the names of the keyword arguments; they can be derived from the variable names that you pass to the function. So I would indeed prefer

ful = get_data()
pij = ful.pij
ginv = 1/ful.gravit
accuracy(view(pij,:,:,1); 
         ful.dx, ful.dz, # I am assuming these two are related...
         ful.bvf2e,
         ful.cee,
         ful.iocn, ful.kocn, # ... as well as these two
         ful.f0, 
         ginv)

But of course — stick to your preferred way! I was just curious.

I’d write:

function read_data()
...
  return (; v1, v2, etc...) # NamedTuple!
end

(;v1, v2, v3, . . . . ) = read_data()

function dothis(v3, v5)
  # do this
end
function dothat(v1,  v12)
  # do that
end

or alternatively

function dothis(; v3, v5)
  # do this
end
function dothat(; v1,  v12)
  # do that
end

or (what I actually usually do)

nt = read_data()

function dothis(; kwargs...)
  x = somewhat(v3)
  y = x + v5  # do something with v3 and v5
end

dothis(; nt...)

EDIT:
I’ve copied from a post above as
(v1, v2, v3, . . . . ) = read_data()

What I’d always do, and therefore overlook it’s something different here - I’d generally use NamedTuples all the way through - I edited the example above correspondingly.

I didn’t know that! Thank you!!!

But how is that possible?

function get_data()
  (a = "a", b = "b", c = "c", q = "q", r = "r", th = "th")
end

function func(; th = nothing, q = "", c = "empty", r = "hello")
  @show th, q, c, r
end

ful = get_data()
@show typeof(ful.q) # -> String
@show typeof(ful.r) # -> String
@show typeof("q") # -> String
@show typeof("r") # -> String
func(; ful.q, ful.r) # -> works!
func(; "q", "r") # -> fails.

The types of ful.q and ful.r are String and so are those of "q" and "r". Yet, the first call works and the second call fails. I’m sure I’m missing something.

It’s not the type of the value assigned to a variable, it’s the variable’s name:

function f(;x=1)
end

x = 3
y = 3
f(;x) # works: `f` has a keyword argument called `x`
f(;y) # does not work: `f` does not have a keyword argument called `y`
1 Like

If a keyword argument appears after a semicolon and without a <keyword>=, Julia will use the name of the variable as the keyword. I never knew, but apparently when there is a property access in the argument it will use the property name instead.

Both cases involve parser magic (maybe it’s not literally the parser but it’s pretty close) and cannot be derived from first principles regarding the rest of the language. This feature has only been around since roughly v1.7 so it’s also somewhat less known.

A related syntax is property destructuring:

julia> (; a, b) = (; a=3, b=4, c=5)
(a = 3, b = 4, c = 5)

julia> a # `a` and `b` are now assigned
3
3 Likes

This is the solution I use.

"write docstring based on main method to be clear about inputs."
function f(x, y, z)
  # main function body
end
f(nt::NamedTuple) = f(nt.x, nt.y, nt.z)

This is the solution I use.

I believe you but I’m a bit surprised. By adding the interface f(nt:NamedTuple), you miss an opportunity of the compiler checking argument mismatch:

# -- in one module
function f(; x = "hello", y = 3.14, z = nothing)
  @show x, y, z
end
f(nt::NamedTuple) = f(; nt.x, nt.y, nt.z)

function g(; x = "hello", y = 3.14, z = nothing)
  @show x, y, z
end

read_data() = (x = "x", z = "z", y = "y")

# -- in the main program
vars = read_data()

nt = (; vars..., yy = "yy")
#^^^ I thought the name was "yy" and I didn't know that vars included "y".

f(; x = "x", z = "z", y = "y") # works
f(; x = "x", z = "z", yy = "yy") # error
f(nt) # no error
g(; nt...) # error

The problem of the f(nt:NamedTuple) interface is that extra variables are silently ignored.

Coming back to my original question: It’s tedious to write f(; x=vars.x, y=vars.y, . . .) What’s a more succinct way to write it?

We have two possible designs:

  1. Subset the original NamedTuple and send it to the function as g(; subset...).
  2. Write an extra interface to f to take all NamedTuples and send everything to the function as f(nt).

I would argue that design 1 is superior. For design 2, you have to write an extra interface f(nt:NamedTuple) and accept some possibility of argument mismatch.

Design 1 doesn’t have any drawback once you know how to subset a given NamedTuple.

The above is a contrived, stupid example, but I do make argument mismatch error from time to time. I sometimes include the same variable twice in the argument list by mistake and if I misspell one of them, the mistake would go unnoticed if it’s in a NamedTuple. And I do use NamedTuple to pass a set of arguments to a function.

This is not a fictitious scenario. Makie the graphics library accepts any keyword arguments and doesn’t check for misspellings, silently ignoring those it doesn’t recognize. As a result, it sometimes takes a lot of time to find out your misspellings.

In other words, I want to say, “this function requires this and that variables, not more”, not only in documentations, but I want to tell that to the compiler.

1 Like

I accept that possibility for the simplicity, but I also don’t manually add to the NamedTuple. (I write gather_<x>_input functions which prompt the user for values and hardcode the keys.) If you want more robust control, I suggest writing function methods for custom composite types instead of NamedTuple or kwargs....