DataFrame vector to columns

I tried to find a way to transform a column containing vectors (all of equal length) as values into columns per vector element.

df = DataFrame(data=rand(3), list_data=[[1, 2], [3, 4], [5, 6]])
transform(df, :list_data => (col -> [el for el in col]) => [:A,:B])

The above code works. However, I don’t have any clue why it is working? My understanding is that

col: represents column df.list_data which is of type Array{Array{Int64,1},1}.

But isn’t [el for el in col] also of type Array{Array{Int64,1},1}?

At least typeof([el for el in df.list_data]) == typeof(df.list_data) returns true.

So why does the split into columns then work?

This may be a weird question but I want to understand the mechanics behind dataframes and julia.

Thanks!

I think you might be interested in this recent blog post from Bogumil - your case is explained in the section “Multiple target columns”.

Here the elements of list_data are iterable (as they are vectors), so in combination with multiple return columns they are split up. Your transformation col -> [el for el in col] does indeed not do anything:

julia> df.list_data == [el for el in df.list_data]
true

and so you could have shortened your example to

julia> transform(df, :list_data => identity => [:A,:B])
3×4 DataFrame
 Row │ data      list_data  A      B     
     │ Float64   Array…     Int64  Int64 
─────┼───────────────────────────────────
   1 │ 0.895446  [1, 2]         1      2
   2 │ 0.386052  [3, 4]         3      4
   3 │ 0.978884  [5, 6]         5      6

3 Likes

Great, thanks for your answer. Will check out the mentioned blog.