[DataFrames Question]: How to convert single column with row of dictionary to multiple columns

Question:

If I have a dataframe with a column that has rows which are dictionaries, whats the appropriate way to convert each key to a column and fill the elements with the value. I assume its through the transform! function, but I’m not sure exactly how to do this.

Example:

Here is one way (interested to see other approaches):

## Setup
using DataFrames
df = DataFrame(x = [Dict(:y => 1.0, :z => 2.0), Dict(:yz=>1.0, :y=>0.0, :z=>0.5)])

## Scan first to get list of possible column names:
keylist = Set()

for row in eachrow(df)
   push!(keylist, keys(row.x)...)
end

## Now create the transformed dataframe:
for k in keylist
    df[!,k] = [haskey(row.x,k) ? row.x[k] : missing for row in eachrow(df)]
end
2 Likes

DataFrame constructors accept Dicts, so the simplest is probably:

julia> reduce(vcat, DataFrame.(df.x), cols = :union)
2Γ—3 DataFrame
 Row β”‚ y      z      yz
     β”‚ Int64  Int64  Int64?
─────┼───────────────────────
   1 β”‚     1      2  missing
   2 β”‚     4      5        3

which you can then just hcat onto your existing table:

julia> df = DataFrame(x = [Dict(:y => 1, :z => 2), Dict(:yz => 3, :y => 4, :z => 5)])
2Γ—1 DataFrame
 Row β”‚ x
     β”‚ Dict…
─────┼────────────────────────────
   1 β”‚ Dict(:y=>1, :z=>2)
   2 β”‚ Dict(:yz=>3, :y=>4, :z=>5)

julia> hcat(df, reduce(vcat, DataFrame.(df.x), cols = :union))
2Γ—4 DataFrame
 Row β”‚ x                           y      z      yz
     β”‚ Dict…                       Int64  Int64  Int64?
─────┼───────────────────────────────────────────────────
   1 β”‚ Dict(:y=>1, :z=>2)              1      2  missing
   2 β”‚ Dict(:yz=>3, :y=>4, :z=>5)      4      5        3
5 Likes

@nilshg and @jd-foster thanks a bunch, this does what I need. I checked the timings with @time, no significant difference I see. Is there a reason not to do this with transform?

transform needs matching keys for each row, so if your second Dict has :yz as a key, it won’t work.

1 Like