I’m trying to delete specific rows in my DataFrame based on conditionals of the row above. In my data, there is a repeated row of the same time point at each time that a medication is administered. My goal is to delete all repeated rows.
I’ve tried the following and was looking if Queryverse had some type of deletion tool with no luck:
for i = 1:length(infusion_single.time)
if infusion_single.time[i] == infusion_single.time[i+1]
delete!(infusion_single.time[i+1])
end
end
If all the duplicates are the exact same row, unique(df) will work.
If you are just looking for duplicates where the rows are next to each other, here is a function that will work.
One thing that makes your loop tough is that you are modifying the thing you are looping over, which makes behavior tough to reason about. My function makes a new data frame and then pushes rows to it.
function dropdupecols(df, cols)
new_df = DataFrame()
last_row = df[2,:] # just so we can declare it and update it. Can't use the first row.
for row in eachrow(df)
if row[cols] != last_row[cols]
push!(new_df, row)
end
last_row = row
end
return new_df
end