Expanding Named Tuples

rocco_sprmnt21 · June 5, 2021, 1:40pm

Starting from this situation (or similar),

julia> hcat(DataFrame(a=[1,2]), [(b=1,c=2), (b=3,c=4)])
2×2 DataFrame
 Row │ a      x1             
     │ Int64  NamedTup…      
─────┼───────────────────────
   1 │     1  (b = 1, c = 2)
   2 │     2  (b = 3, c = 4)

is there a direct way to obtain this result?

In general, in what ways can a flat DataFrame be obtained, starting from such a situation?

2×3 DataFrame
 Row │ a      b      c     
     │ Int64  Int64  Int64 
─────┼─────────────────────
   1 │     1      1      2
   2 │     2      3      4

oheil · June 5, 2021, 1:49pm

Not clear what the real situation is, but this gives the desired result:

julia> hcat( DataFrame(a=[1,2]), DataFrame([(b=1,c=2),(b=3,c=4)]) )
2×3 DataFrame
 Row │ a      b      c
     │ Int64  Int64  Int64
─────┼─────────────────────
   1 │     1      1      2
   2 │     2      3      4

But of course, I have just changed the situation to get this.
What is your real starting point?

rocco_sprmnt21 · June 5, 2021, 2:03pm

No real situation.
I was reading this PR and I asked myself the question.

I would expect a solution of the following type, (ie having as arguments the dataframe and the column to expand) but without hcat.

dfnt=hcat(DataFrame(a=[1,2]), [(b=1,c=2), (b=3,c=4)])
hcat(dfnt.a,DataFrame(dfnt.x1))

pdeffebach · June 5, 2021, 2:09pm

The problem (as explained in the PR you linked) is that we already define hcat(df::AbstractDataFrame, v::AbstractVector). We would like to add something along the lines of

function hcat(df::AbstractDataFrame, t::Any)
    if Tables.istable(t)
        hcat(df, DataFrame(t; copycols = false)
    end
end

But of course, we can’t use dispatch to check if something is a Table in the Tables.jl-sense. And there are many things that are both <: Vector and satisfy Tables.istable, like a vector of named tuples.

But we are post 1.0, so we can’t break hcat(df::AbstractDataFrame, v::AbstractVector). So no, you won’t be able to do

hcat(DataFrame(a=[1,2]), [(b=1,c=2), (b=3,c=4)])

and have it automatically flatten. That would break a post-1.0 guarantee of stability.

rocco_sprmnt21 · June 5, 2021, 2:27pm

Hi @pdeffebach,

I’m not sure if I understand your answer correctly (my knowledge of Julia is very limited), but I would like to be sure that I have asked my question correctly and so I try to ask it again.

If after somehow transforming a dataframe (perhaps obtained by reading a JSON file !?) I get some columns that are vectors of named tuples, how can I expand/flat them to obtain distinct columns corresponding to the names of the keys of the named tuples?

PS
I don’t intend to use / modify the hcat function

pdeffebach · June 5, 2021, 2:49pm

Calling DataFrame(v) should work.

If you have this scenario

julia> vnt = [(a = 1, b = (c = 2, d = 3)), (a = 4, b = (c = 5, d = 6))]
2-element Vector{NamedTuple{(:a, :b), Tuple{Int64, NamedTuple{(:c, :d), Tuple{Int64, Int64}}}}}:
 (a = 1, b = (c = 2, d = 3))
 (a = 4, b = (c = 5, d = 6))

then I’m not 100% what the solution is, but I’m sure other people can help out. Here is one solution with recursion

julia> vnt = [(a = 1, b = (c = 2, d = 3)), (a = 4, b = (c = 5, d = 6))];

julia> function unnest!(d, nt)
           for (n, v) in pairs(nt)
               if v isa NamedTuple
                   unnest!(d, v)
               else
                   push!(d, n => v)
               end
           end
       end;

julia> function unnest(nt)
           d = Dict{Symbol, Any}()
           unnest!(d, nt)
           return d
       end;

julia> Tables.istable(unnest.(vnt))
true

julia> DataFrame(unnest.(vnt))
2×3 DataFrame
 Row │ a      d      c     
     │ Int64  Int64  Int64 
─────┼─────────────────────
   1 │     1      3      2
   2 │     4      6      5

But this solution has some problems. In particular, it won’t have consistent column ordering (this can be fixed using an ordered dict from OrderedCollections.jl).

But I feel like we have good solutions to this problem that I am not finding at the moment.

EDIT: Also look at JSONTables.jl, for a particular JSON-oriented use-case

oheil · June 5, 2021, 3:48pm

julia> t=hcat(DataFrame(a=[1,2]), [(b=1,c=2), (b=3,c=4)])
2×2 DataFrame
 Row │ a      x1
     │ Int64  NamedTup…
─────┼───────────────────────
   1 │     1  (b = 1, c = 2)
   2 │     2  (b = 3, c = 4)

From a column like x1 above, you can proceed with:

julia> DataFrame(t[!,:x1])
2×2 DataFrame
 Row │ b      c
     │ Int64  Int64
─────┼──────────────
   1 │     1      2
   2 │     3      4

Just saying, perhaps that’s what you are looking for.

rocco_sprmnt21 · June 5, 2021, 5:36pm

No. This is not exactly what I am looking for, because I would like to have the whole dataframe as an output.

Something like this (which came to my mind trying to figure out the functions of @pdeffebach )

df=DataFrame(vnt)
ransform(df, :b=>ByRow(x->(x.c,x.d))=>[:c,:d])

oheil · June 5, 2021, 5:44pm

Well, the whole DataFrame is than this:

julia> hcat(DataFrame(a=t[!,:a]),DataFrame(t[!,:x1]))
2×3 DataFrame
 Row │ a      b      c
     │ Int64  Int64  Int64
─────┼─────────────────────
   1 │     1      1      2
   2 │     2      3      4

But, never mind, it seems I am missing the point. I also didn’t read the PR you linked.

pdeffebach · June 5, 2021, 6:37pm

use AsTable as the output

df=DataFrame(vnt)
transform(df, :b=>ByRow(identity) => AsTable)

Elrod · June 5, 2021, 6:42pm

istableval(x) = Val(Tables.istable(x)) # const prop should make this infer correctly
hcat(df::AbstractDataFrame, t) = _hcat(istableval(t), df, t)
_hcat(::Val{true}, df, t) = hcat(df, DataFrame(t; copycols = false))
_hcat(::Val{false}, df, t) = ...

I placed the Val as the first argument of _hcat because hcat often accepts Varargs.

pdeffebach · June 5, 2021, 6:48pm

Fair enough! Perhaps this technique can be used internally in DataFrames more.

Still doesn’t get around the problem that things are both <: AbstractVector and tables.

rocco_sprmnt21 · June 5, 2021, 7:22pm

it seems that ByRow is not needed

select(transform(df, :b=>identity=>AsTable), Not(:b))

just out of curiosity, why doesn’t the following expression work like the previous one?

select(transform(df, :b=>AsTable), Not(:b))

pdeffebach · June 5, 2021, 8:13pm

I think it probably should work, and I’ve filed an issue here. But no guarantees because the mini-language is complicated enough as-is.

Topic		Replies	Views
Transform! to destructure NamedTuple into columns General Usage question , dataframes	7	483	January 21, 2022
Convert NamedTuple with matrix to DataFrame General Usage question , dataframes	7	325	March 9, 2024
Construct DataFrame From Uneven Named Tuples General Usage dataframes	18	1085	August 20, 2023
Dataframe destructors Data question , dataframes , namedtuple	2	470	February 20, 2022
How to create `DataFrame` from using NamedTuple keys as column names Data	4	2749	August 11, 2019

Expanding Named Tuples

Related topics