Delete rows in DataFrame Conditionally

I’m trying to delete specific rows in my DataFrame based on conditionals of the row above. In my data, there is a repeated row of the same time point at each time that a medication is administered. My goal is to delete all repeated rows.

I’ve tried the following and was looking if Queryverse had some type of deletion tool with no luck:

for i = 1:length(infusion_single.time)
if infusion_single.time[i] == infusion_single.time[i+1]
delete!(infusion_single.time[i+1])
end
end

Thanks!

Have you looked at the unique! function?

1 Like

If all the duplicates are the exact same row, unique(df) will work.

If you are just looking for duplicates where the rows are next to each other, here is a function that will work.

One thing that makes your loop tough is that you are modifying the thing you are looping over, which makes behavior tough to reason about. My function makes a new data frame and then pushes rows to it.

function dropdupecols(df, cols)
    new_df = DataFrame()
    last_row = df[2,:] # just so we can declare it and update it. Can't use the first row. 
    for row in eachrow(df)
        if row[cols] != last_row[cols]
           push!(new_df, row)
        end
        last_row = row
    end
    return new_df
end
1 Like

unique! takes a column argument as well, yea?

unique!(infusion_single,:time)
2 Likes

This should work if I understand correctly that time is a numeric column:

infusion_single = infusion_single[[true; diff(infusion_single.time) .!= 0], :]

The true keeps the first row because diff returns an array one element shorter than the column.