Creating new dataframe column!

I am trying to create a new column for a dataframe, such that it appends values only when the columns from the other dataframe have similar values.

For example:

using DataFrames, Statistics
a = ["1.0"; "2.0"; "2.0"; "4.0"; "5.0"]
b = collect(range(1, 5, length=5))
df = DataFrame(:a=> a, :b=>b)

x = combine(DataFrames.groupby(df, :a), :b=>mean)
df[!, :b_new] = [x[:, 2] for i in df[:, :a]]

println(df)

Output:->
5Γ—3 DataFrame
 Row     β”‚ a          b               b_new                     
         β”‚ String   Float64           Array…               
─────────┼────────────────────────────────────────────────
       1 β”‚ 1.0       1.0       [1.0, 2.5, 4.0, 5.0]
       2 β”‚ 2.0       2.0       [1.0, 2.5, 4.0, 5.0]
       3 β”‚ 2.0       3.0       [1.0, 2.5, 4.0, 5.0]
       4 β”‚ 4.0       4.0       [1.0, 2.5, 4.0, 5.0]
       5 β”‚ 5.0       5.0       [1.0, 2.5, 4.0, 5.0]

However i want to produce this dataframe:

Output:->
5Γ—4 DataFrame
 Row     β”‚ a          b            b_new                     
         β”‚ String   Float64       Float64               
─────────┼────────────────────────────────────────────────
       1 β”‚ 1.0       1.0            1.0
       2 β”‚ 2.0       2.0            2.5
       3 β”‚ 2.0       3.0            2.5
       4 β”‚ 4.0       4.0            4.0
       5 β”‚ 5.0       5.0            5.0

May I know how can i achieve the expected output?

Thanks!!

Could you amend your MWE so that it works?

julia> x = combine(DataFrames.groupby(df, :a), :c=>mean)
ERROR: ArgumentError: column name :c not found in the data frame; existing most similar names are: :a and :b

Apologies!!
I have edited the question, now it is working. :slight_smile:

Thanks for the response!

You can use transform:

julia> transform(groupby(df, :a), :b=>mean)
5Γ—3 DataFrame
 Row β”‚ a       b        b_mean  
     β”‚ String  Float64  Float64 
─────┼──────────────────────────
   1 β”‚ 1.0         1.0      1.0
   2 β”‚ 2.0         2.0      2.5
   3 β”‚ 2.0         3.0      2.5
   4 β”‚ 4.0         4.0      4.0
   5 β”‚ 5.0         5.0      5.0

See Comparison with Python/R/Stata Β· DataFrames.jl for a list of common operations like this one.

3 Likes