I wanted to report the following results and have comments on how to consider them.
The problem is not fictitious.
It derives from the usecase treated here, where, among other things, the error description is slightly different. Why?
using DataFrames
df=DataFrame(x=rand(1:5,10),y=1:10)
combine(groupby(df,:x), :y=>last)
df=DataFrame(x=rand(1:5,10),y=[(nt=rand(1:5),) for _ in 1:10])
combine(groupby(df,:x), :y=>last)
# julia> combine(groupby(df,:x), :y=>last)
# ERROR: ArgumentError: a single value or vector result is required (got NamedTuple{(:nt,), Tuple{Int64}})
julia> cdf=combine(groupby(df,:x), :y=>Ref∘ last)
4×2 DataFrame
Row │ x y_Ref_last
│ Int64 NamedTupl…
─────┼─────────────────────────
1 │ 1 (f1 = 5, f2 = 4)
2 │ 2 (f1 = 4, f2 = 4)
3 │ 3 (f1 = 3, f2 = 2)
4 │ 5 (f1 = 5, f2 = 3)
julia> combine(groupby(df,:x), :y=>last=>AsTable)
4×3 DataFrame
Row │ x f1 f2
│ Int64 Int64 Int64
─────┼─────────────────────
1 │ 1 5 4
2 │ 2 4 4
3 │ 3 3 2
4 │ 5 5 3
Can you please give a specific question you have? All you present in the post above works as expected. (apart from he fact that you seem to have reported results of using a different df that you create)
If this error message is what is confusing you the reason for the error is the following. Your operation returns a named tuple, which is a multi-column result. It is allowed to return a multi column result only if AsTable or list of column names is passed as target columns names.
Now this question, and your previous question show the approach we take in DataFrames.jl (as opposed to R). We want to make sure that user gets a correct result. If something is ambiguous we throw an error. This is different to R, which tries to guess what user wanted in case of ambiguity. We chose the “safety first” approach, as it is preferred in production applications (when you do not want to silently get a wrong result).
df=DataFrame(x=rand(1:5,10),y=[(f1=rand(1:5),f2=rand(1:5)) for _ in 1:10])
As with push I can insert a namedtuple as the value of a cell, I would have expected that the result of a particular function (last (array of namedtuple) in this case) inside combine would also be treated similarly.
PS
I know of the other situations where the cell with a named tuple has to be “expanded”.
I don’t know if it is possible or even useful to make the two situations coexist
As with push I can insert a namedtuple as the value of a cell
You cannot. If you push NamedTuple it will always get expanded to multiple columns. If you want to push NamedTuple to a single cell you would need to wrap it (e.g. in a vector).