I am stuck on this for loop. I need a copy of a dataframe row so I can modify it and then push the modified row onto to the dataframe.
My K-MF label means a male and female kiwi are both calling at the same time, I want to change this so I have 2 labels (or rows), one for the female call, one for the male call.
This is all happening in a function that builds my training dataset which is quite large and growing.
for row in eachrow(data_frame)
# correct to Male, Female, Close to match newer annotations
if row.species == "K-M"
row.species = "Male"
elseif row.species == "K-F"
row.species = "Female"
# correct K-MF label to Male, plus another identical row with label Female
elseif row.species == "K-MF"
row.species = "Male"
# I need to break off a copy of row into a new dataframe with no
# connection to the original row
new_row = row[:] # I thought it was a copy
new_row.species = "Female"
push!(data_frame, new_row)
# I end up with 2 rows with species=Female, instead of 1 male,
# 1 female as 'new_row.species = "Female"' modifies both the new
# row and the original
end
end
Kindness
David
Not at a computer to test, but can you try copy(row)
?
Thanks, yes I had tried already but did again anyway.
Here is the error: “ERROR: setfield!: immutable struct of type NamedTuple cannot be changed”
copy() does not return a dataframe.
Somewhat confusingly, copy(row)
returns a NamedTuple
, not a new DataFrameRow
. This is unfortunately what’s causing your problem.
You can push!
a dictionary to a data frame, so you might want to do
julia> Dict(k => v for (k, v) in enumerate(dfr))
Dict{Int64, Int64} with 2 entries:
2 => 3
1 => 1
instead of copy
.
1 Like
try this
for row in eachrow(df)
# correct to Male, Female, Close to match newer annotations
if row.species == "K-M"
row.species = "Male"
elseif row.species == "K-F"
row.species = "Female"
# correct K-MF label to Male, plus another identical row with label Female
elseif row.species == "K-MF"
row.species = "Male"
push!(df, merge(row, (species="Female",)))
end
end
an alternative way
tdf=transform(df, :species=>ByRow(x->x=="K-M" ? "M" : (x=="K-F" ? "F" : ["M","F"]))=>:g)
flatten(tdf,:g)
The idea works, but since the keys of the dicts are the number of the column id is hard to merge back into the base dataframe. There are a lot of columns oor i would just bodge it manually. This works fine:
new_row = Dict(names(row) .=> values(row))
new_row["species"] = "Female"
Thanks
David
Oh sorry. Yeah, your version is correct.
Looks like you got a solution, but here is a different approach:
function replace_species(df)
# create a copy of the K-MF rows, set them all to femail
k_mf_rows = filter(AsTable(:) => r -> r.species == "K-MF", df)
k_mf_rows.species .= "Female"
# replace species in place (setting original K-MFs to Male)
species_map = Dict(
"K-M" => "Male",
"K-F" => "Female",
"K-MF" => "Male"
)
replace!(v -> species_map[v], df.species)
# Stack on the copies
new_df = vcat(df, k_mf_rows)
return new_df
end
Edit: FWIW, depending on the size of the DataFrame and the number of K-MF values, this solution might be faster because calling push!
over and over is slow since it has to allocate each time.