Question:
If I have a dataframe with a column that has rows which are dictionaries, whats the appropriate way to convert each key
to a column and fill the elements with the value
. I assume its through the transform!
function, but Iβm not sure exactly how to do this.
Example:
Here is one way (interested to see other approaches):
## Setup
using DataFrames
df = DataFrame(x = [Dict(:y => 1.0, :z => 2.0), Dict(:yz=>1.0, :y=>0.0, :z=>0.5)])
## Scan first to get list of possible column names:
keylist = Set()
for row in eachrow(df)
push!(keylist, keys(row.x)...)
end
## Now create the transformed dataframe:
for k in keylist
df[!,k] = [haskey(row.x,k) ? row.x[k] : missing for row in eachrow(df)]
end
2 Likes
DataFrame
constructors accept Dict
s, so the simplest is probably:
julia> reduce(vcat, DataFrame.(df.x), cols = :union)
2Γ3 DataFrame
Row β y z yz
β Int64 Int64 Int64?
ββββββΌβββββββββββββββββββββββ
1 β 1 2 missing
2 β 4 5 3
which you can then just hcat onto your existing table:
julia> df = DataFrame(x = [Dict(:y => 1, :z => 2), Dict(:yz => 3, :y => 4, :z => 5)])
2Γ1 DataFrame
Row β x
β Dictβ¦
ββββββΌββββββββββββββββββββββββββββ
1 β Dict(:y=>1, :z=>2)
2 β Dict(:yz=>3, :y=>4, :z=>5)
julia> hcat(df, reduce(vcat, DataFrame.(df.x), cols = :union))
2Γ4 DataFrame
Row β x y z yz
β Dictβ¦ Int64 Int64 Int64?
ββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββ
1 β Dict(:y=>1, :z=>2) 1 2 missing
2 β Dict(:yz=>3, :y=>4, :z=>5) 4 5 3
5 Likes
@nilshg and @jd-foster thanks a bunch, this does what I need. I checked the timings with @time
, no significant difference I see. Is there a reason not to do this with transform
?
transform
needs matching keys for each row, so if your second Dict
has :yz
as a key, it wonβt work.
1 Like