Make a Copy of a DataFrame Row

I am stuck on this for loop. I need a copy of a dataframe row so I can modify it and then push the modified row onto to the dataframe.

My K-MF label means a male and female kiwi are both calling at the same time, I want to change this so I have 2 labels (or rows), one for the female call, one for the male call.

This is all happening in a function that builds my training dataset which is quite large and growing.

for row in eachrow(data_frame)
            # correct to Male, Female, Close to match newer annotations
            if row.species == "K-M"
                row.species = "Male"

            elseif row.species == "K-F"
                row.species = "Female"

            # correct K-MF label to Male, plus another identical row with label Female
            elseif row.species == "K-MF"
               row.species = "Male"
            # I need to break off a copy of row into a new dataframe with no 
            # connection to the original row
               new_row = row[:] # I thought it was a copy     
               new_row.species = "Female"
               push!(data_frame, new_row) 
            # I end up with 2 rows with species=Female, instead of 1 male, 
            # 1 female as 'new_row.species = "Female"' modifies both the new 
            # row and the original
            end
        end

Kindness
David

Not at a computer to test, but can you try copy(row)?

Thanks, yes I had tried already but did again anyway.

Here is the error: “ERROR: setfield!: immutable struct of type NamedTuple cannot be changed”

copy() does not return a dataframe.

Somewhat confusingly, copy(row) returns a NamedTuple, not a new DataFrameRow. This is unfortunately what’s causing your problem.

You can push! a dictionary to a data frame, so you might want to do

julia> Dict(k => v for (k, v) in enumerate(dfr))
Dict{Int64, Int64} with 2 entries:
  2 => 3
  1 => 1

instead of copy.

1 Like

try this

for row in eachrow(df)
    # correct to Male, Female, Close to match newer annotations
    if row.species == "K-M"
        row.species = "Male"

    elseif row.species == "K-F"
        row.species = "Female"

    # correct K-MF label to Male, plus another identical row with label Female
    elseif row.species == "K-MF"
       row.species = "Male"
       push!(df, merge(row, (species="Female",))) 
    end
end

an alternative way

tdf=transform(df, :species=>ByRow(x->x=="K-M" ? "M" : (x=="K-F" ? "F" : ["M","F"]))=>:g)

flatten(tdf,:g)

The idea works, but since the keys of the dicts are the number of the column id is hard to merge back into the base dataframe. There are a lot of columns oor i would just bodge it manually. This works fine:

new_row = Dict(names(row) .=> values(row))       
new_row["species"] = "Female"

Thanks
David

Thanks

I like this solution.

Regards
David

Oh sorry. Yeah, your version is correct.

Looks like you got a solution, but here is a different approach:

function replace_species(df)
    # create a copy of the K-MF rows, set them all to femail
    k_mf_rows = filter(AsTable(:) => r -> r.species == "K-MF", df)
    k_mf_rows.species .= "Female"

    # replace species in place (setting original K-MFs to Male)
    species_map = Dict(
        "K-M" => "Male",
        "K-F" => "Female",
        "K-MF" => "Male"
    )
    replace!(v -> species_map[v], df.species)
    
    # Stack on the copies
    new_df = vcat(df, k_mf_rows)
    return new_df
end

Edit: FWIW, depending on the size of the DataFrame and the number of K-MF values, this solution might be faster because calling push! over and over is slow since it has to allocate each time.