Replace data from specific column in dataframe

I have two dataframes as shown below and need to replace one column(df.M02) with column(df2.M02) for year 2024 only. I know this can be achieved in multiple ways but I am just trying to find better way to do this.

2×13 DataFrame
 Row │ name     type      year   sub_type  M01       M02      M03      M04      M05      M06      M07       M08       M09     
     │ String   String    Int64  String    Float64?  Float64  Float64  Float64  Float64  Float64  Float64?  Float64?  Float64 
─────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │ test-24  nulltest   2024  prev_mth   missing      3.4      5.6      7.0      8.0      9.0   missing   missing      8.5
   2 │ test-24  nulltest   2025  prev_mth   missing      3.4      5.6      7.0      8.0      9.0   missing   missing      8.5
2×13 DataFrame
 Row │ name     type      year   sub_type  M01       M02      M03      M04      M05      M06      M07       M08       M09     
     │ String   String    Int64  String    Float64?  Float64  Float64  Float64  Float64  Float64  Float64?  Float64?  Float64 
─────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │ test-23  nulltest   2024  prev_mth   missing      3.4      5.6      7.0      8.0      9.0   missing   missing      8.5
   2 │ test-24  nulltest   2025  prev_mth   missing      3.4      5.6      7.0      8.0      9.0   missing   missing      8.5

I would say:

mask1 = df.year .== 2024
mask2 = df2.year .== 2024

@assert mask1 == mask2

@views @. df.M02[mask1] = df2.M02[mask2]

If you know that the rows are identical and in order:

df.M02 = ifelse.(df.year .== 2024, df2.M02, df.M02)
using CSV, DataFrames



df1="""
 name     type      year   sub_type  M01       M02      M03      M04      M05      M06      M07       M08       M09     
 test-24  nulltest   2024  prev_mth   missing      3.41      5.6      7.0      8.0      9.0   missing   missing      8.5
 test-24  nulltest   2025  prev_mth   missing      3.41      5.6      7.0      8.0      9.0   missing   missing      8.5
 """


df1=CSV.read(IOBuffer(df1), DataFrame, delim=' ', ignorerepeated=true)


df2="""
 name     type      year   sub_type  M01       M02      M03      M04      M05      M06      M07       M08       M09     
 test-23  nulltest   2024  prev_mth   missing      3.42      5.6      7.0      8.0      9.0   missing   missing      8.5
 test-24  nulltest   2025  prev_mth   missing      3.42      5.6      7.0      8.0      9.0   missing   missing      8.5
 """

 
df2=CSV.read(IOBuffer(df2), DataFrame, delim=' ', ignorerepeated=true)

grp1=groupby(df1,:year)

grp2=groupby(df2,:year)


# (df.M02) with column(df2.M02) 

grp1[(2024,)].M02=grp2[(2024,)].M02

julia> df1
2×13 DataFrame
 Row │ name     type      year   sub_type  M01      M02      M03      M04      M05      M06      M07      M08      M0 ⋯
     │ String7  String15  Int64  String15  String7  Float64  Float64  Float64  Float64  Float64  String7  String7  Fl ⋯
─────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │ test-24  nulltest   2024  prev_mth  missing     3.42      5.6      7.0      8.0      9.0  missing  missing     ⋯
   2 │ test-24  nulltest   2025  prev_mth  missing     3.41      5.6      7.0      8.0      9.0  missing  missing
                                                                                                       1 column omitted

A leftjoin is probably the best choice here. Unless you absolutely know that the rows in each data frame overlap, you should avoid doing stuff like df1.x = df2.x.

Here’s a solution involving a join and DataFramesMeta.jl

@chain df begin 
    leftjoin(df1, @select(df2, :id, :M02_2 = :M02), on = :id, )
    @rtransform :M02 = :year == 2004 ? :M02_2 : M02
end
1 Like

despite all attempts to give an answer, the question seems to lie precisely on this aspect: the problem is not well defined.