How to properly add two new columns (in dataframe) from a function that returns two arrays?

Hello, I am a Julia newbie and I have a question.
The example below sort of simplified my current problemβ€”

function myfun(id)
  a = [id .+ 1, id .+ 2, id .+ 2]
  b = [id .- 1, id .- 2, id .- 3]
  return [a, b]
end

mydf = DataFrame(id = [7, 8])
combine(groupby(mydf, :id), :id => myfun)

For each group (groupby id in my data mydf), I run a function, myfun, which generates two arrays based on a variable in mydf.
What I want to do is to add the two arrays (a and b) from the function to the original data as two columns.

However… the command below generates only a single column

combine(groupby(mydf, :id), :id => myfun)

4Γ—2 DataFrame
 Row β”‚ id     id_myfun          
     β”‚ Int64  Array…            
─────┼──────────────────────────
   1 β”‚     7  [[8], [9], [9]]
   2 β”‚     7  [[6], [5], [4]]
   3 β”‚     8  [[9], [10], [10]]
   4 β”‚     8  [[7], [6], [5]]

Ideally, I want my result to be like below…

 Row β”‚ id     β”‚  a    β”‚b 
     β”‚ Int64  β”‚ Int64 β”‚ Int64
─────┼──────────────────────────
   1 β”‚     7  | 8     β”‚ 6 
   2 β”‚     7  | 9     β”‚ 5
   3 β”‚     7  | 9     β”‚ 4 
   4 β”‚     7  | 9     β”‚ 7 
   5 β”‚     7  | 10    β”‚ 6
   6 β”‚     7  | 10    β”‚ 5 

Is there anyone who could help me?

Thanks!

1 Like

Here’s one way:

julia> df_ab = transform(mydf, :id => ByRow(myfun) => [:a, :b])
2Γ—3 DataFrame
 Row β”‚ id     a            b         
     β”‚ Int64  Array…       Array…    
─────┼───────────────────────────────
   1 β”‚     7  [8, 9, 9]    [6, 5, 4]
   2 β”‚     8  [9, 10, 10]  [7, 6, 5]

julia> flatten(df_ab, [:a, :b])
6Γ—3 DataFrame
 Row β”‚ id     a      b     
     β”‚ Int64  Int64  Int64 
─────┼─────────────────────
   1 β”‚     7      8      6
   2 β”‚     7      9      5
   3 β”‚     7      9      4
   4 β”‚     8      9      7
   5 β”‚     8     10      6
   6 β”‚     8     10      5
4 Likes

Another way

julia> function myfun(id)
           a = [id .+ 1, id .+ 2, id .+ 2]
           b = [id .- 1, id .- 2, id .- 3]
           id = fill(id, length(a))
           (; id, a, b)
       end;

julia> vcat(DataFrame.(myfun.([7, 8]))...)
6Γ—3 DataFrame
 Row β”‚ id     a      b
     β”‚ Int64  Int64  Int64
─────┼─────────────────────
   1 β”‚     7      8      6
   2 β”‚     7      9      5
   3 β”‚     7      9      4
   4 β”‚     8      9      7
   5 β”‚     8     10      6
   6 β”‚     8     10      5
2 Likes

Many Thanks @sijo and @rikh!
These are perfect answers for me.

Some alternatives for reference :slight_smile:

combine(groupby(mydf, :id), :id => Base.splat(hcat)∘myfun∘only => [:a, :b])

combine(groupby(mydf, :id), :id => NamedTuple{(:a, :b)}∘myfun∘only => AsTable)

# If the `id` column is not required in the result:
combine(mydf, :id => Base.splat(vcat)∘ByRow(Base.splat(hcat)∘myfun) => [:a, :b])

There’s also an issue discussing the idea of making this easier.

1 Like

Thank you for listing all these excellent alternatives.

One quick follow-up questionβ€”
Is their performance comparable? I wonder if some of them are dominantly better than the other in terms of speed.

I don’t know… probably best to run benchmarks for your real use case.

Thanks! I will do some tests.