# Expanding Named Tuples

Starting from this situation (or similar),

``````julia> hcat(DataFrame(a=[1,2]), [(b=1,c=2), (b=3,c=4)])
2×2 DataFrame
Row │ a      x1
│ Int64  NamedTup…
─────┼───────────────────────
1 │     1  (b = 1, c = 2)
2 │     2  (b = 3, c = 4)

``````

is there a direct way to obtain this result?

In general, in what ways can a flat DataFrame be obtained, starting from such a situation?

``````2×3 DataFrame
Row │ a      b      c
│ Int64  Int64  Int64
─────┼─────────────────────
1 │     1      1      2
2 │     2      3      4
``````

Not clear what the real situation is, but this gives the desired result:

``````julia> hcat( DataFrame(a=[1,2]), DataFrame([(b=1,c=2),(b=3,c=4)]) )
2×3 DataFrame
Row │ a      b      c
│ Int64  Int64  Int64
─────┼─────────────────────
1 │     1      1      2
2 │     2      3      4
``````

But of course, I have just changed the situation to get this.
What is your real starting point?

1 Like

No real situation.

I would expect a solution of the following type, (ie having as arguments the dataframe and the column to expand) but without hcat.

``````dfnt=hcat(DataFrame(a=[1,2]), [(b=1,c=2), (b=3,c=4)])
hcat(dfnt.a,DataFrame(dfnt.x1))
``````

The problem (as explained in the PR you linked) is that we already define `hcat(df::AbstractDataFrame, v::AbstractVector)`. We would like to add something along the lines of

``````function hcat(df::AbstractDataFrame, t::Any)
if Tables.istable(t)
hcat(df, DataFrame(t; copycols = false)
end
end
``````

But of course, we can’t use dispatch to check if something is a `Table` in the Tables.jl-sense. And there are many things that are both `<: Vector` and satisfy `Tables.istable`, like a vector of named tuples.

But we are post 1.0, so we can’t break `hcat(df::AbstractDataFrame, v::AbstractVector)`. So no, you won’t be able to do

``````hcat(DataFrame(a=[1,2]), [(b=1,c=2), (b=3,c=4)])
``````

and have it automatically flatten. That would break a post-1.0 guarantee of stability.

2 Likes

Hi @pdeffebach,

I’m not sure if I understand your answer correctly (my knowledge of Julia is very limited), but I would like to be sure that I have asked my question correctly and so I try to ask it again.

If after somehow transforming a dataframe (perhaps obtained by reading a JSON file !?) I get some columns that are vectors of named tuples, how can I expand/flat them to obtain distinct columns corresponding to the names of the keys of the named tuples?

PS
I don’t intend to use / modify the hcat function

Calling `DataFrame(v)` should work.

If you have this scenario

``````julia> vnt = [(a = 1, b = (c = 2, d = 3)), (a = 4, b = (c = 5, d = 6))]
2-element Vector{NamedTuple{(:a, :b), Tuple{Int64, NamedTuple{(:c, :d), Tuple{Int64, Int64}}}}}:
(a = 1, b = (c = 2, d = 3))
(a = 4, b = (c = 5, d = 6))
``````

then I’m not 100% what the solution is, but I’m sure other people can help out. Here is one solution with recursion

``````julia> vnt = [(a = 1, b = (c = 2, d = 3)), (a = 4, b = (c = 5, d = 6))];

julia> function unnest!(d, nt)
for (n, v) in pairs(nt)
if v isa NamedTuple
unnest!(d, v)
else
push!(d, n => v)
end
end
end;

julia> function unnest(nt)
d = Dict{Symbol, Any}()
unnest!(d, nt)
return d
end;

julia> Tables.istable(unnest.(vnt))
true

julia> DataFrame(unnest.(vnt))
2×3 DataFrame
Row │ a      d      c
│ Int64  Int64  Int64
─────┼─────────────────────
1 │     1      3      2
2 │     4      6      5
``````

But this solution has some problems. In particular, it won’t have consistent column ordering (this can be fixed using an ordered dict from `OrderedCollections.jl`).

But I feel like we have good solutions to this problem that I am not finding at the moment.

EDIT: Also look at JSONTables.jl, for a particular JSON-oriented use-case

``````julia> t=hcat(DataFrame(a=[1,2]), [(b=1,c=2), (b=3,c=4)])
2×2 DataFrame
Row │ a      x1
│ Int64  NamedTup…
─────┼───────────────────────
1 │     1  (b = 1, c = 2)
2 │     2  (b = 3, c = 4)
``````

From a column like `x1` above, you can proceed with:

``````julia> DataFrame(t[!,:x1])
2×2 DataFrame
Row │ b      c
│ Int64  Int64
─────┼──────────────
1 │     1      2
2 │     3      4
``````

Just saying, perhaps that’s what you are looking for.

No. This is not exactly what I am looking for, because I would like to have the whole dataframe as an output.

Something like this (which came to my mind trying to figure out the functions of @pdeffebach )

``````df=DataFrame(vnt)
ransform(df, :b=>ByRow(x->(x.c,x.d))=>[:c,:d])
``````

Well, the whole DataFrame is than this:

``````julia> hcat(DataFrame(a=t[!,:a]),DataFrame(t[!,:x1]))
2×3 DataFrame
Row │ a      b      c
│ Int64  Int64  Int64
─────┼─────────────────────
1 │     1      1      2
2 │     2      3      4
``````

But, never mind, it seems I am missing the point. I also didn’t read the PR you linked.

use `AsTable` as the output

``````df=DataFrame(vnt)
transform(df, :b=>ByRow(identity) => AsTable)
``````
2 Likes
``````istableval(x) = Val(Tables.istable(x)) # const prop should make this infer correctly
hcat(df::AbstractDataFrame, t) = _hcat(istableval(t), df, t)
_hcat(::Val{true}, df, t) = hcat(df, DataFrame(t; copycols = false))
_hcat(::Val{false}, df, t) = ...
``````

I placed the `Val` as the first argument of `_hcat` because `hcat` often accepts `Vararg`s.

Fair enough! Perhaps this technique can be used internally in DataFrames more.

Still doesn’t get around the problem that things are both `<: AbstractVector` and tables.

it seems that ByRow is not needed

``````select(transform(df, :b=>identity=>AsTable), Not(:b))
``````

just out of curiosity, why doesn’t the following expression work like the previous one?

``````select(transform(df, :b=>AsTable), Not(:b))
``````

I think it probably should work, and I’ve filed an issue here. But no guarantees because the mini-language is complicated enough as-is.

1 Like