ZettEff
December 29, 2020, 4:36pm
1
I tried to find a way to transform a column containing vectors (all of equal length) as values into columns per vector element.
df = DataFrame(data=rand(3), list_data=[[1, 2], [3, 4], [5, 6]])
transform(df, :list_data => (col -> [el for el in col]) => [:A,:B])
The above code works. However, I don’t have any clue why it is working? My understanding is that
col
: represents column df.list_data which is of type Array{Array{Int64,1},1}
.
But isn’t [el for el in col]
also of type Array{Array{Int64,1},1}
?
At least typeof([el for el in df.list_data]) == typeof(df.list_data)
returns true.
So why does the split into columns then work?
This may be a weird question but I want to understand the mechanics behind dataframes and julia.
Thanks!
nilshg
December 29, 2020, 4:44pm
2
I think you might be interested in this recent blog post from Bogumil - your case is explained in the section “Multiple target columns”.
Here the elements of list_data
are iterable (as they are vectors), so in combination with multiple return columns they are split up. Your transformation col -> [el for el in col]
does indeed not do anything:
julia> df.list_data == [el for el in df.list_data]
true
and so you could have shortened your example to
julia> transform(df, :list_data => identity => [:A,:B])
3×4 DataFrame
Row │ data list_data A B
│ Float64 Array… Int64 Int64
─────┼───────────────────────────────────
1 │ 0.895446 [1, 2] 1 2
2 │ 0.386052 [3, 4] 3 4
3 │ 0.978884 [5, 6] 5 6
3 Likes
ZettEff
December 29, 2020, 5:05pm
3
Great, thanks for your answer. Will check out the mentioned blog.