How would I update a DataFrame with new data and delete old data so I can maintain the N most recent rows per person

Hi how’s it going? Here is an example of the dataframe, and a new data row to be added.

df = DataFrame(:name=>["john","john","john","john","john","mike","mike","mike","mike","mike"]
df[:day] = DateTime.(df[:day])

new_data = DataFrame(:name=>["john"],:day=>["2020-05-15"],:earnings=>[18])
new_data[:day] = DateTime.(new_data[:day])

heres a screenshot for a better view of the data:
Screenshot from 2020-07-01 09-49-37

Screenshot from 2020-07-01 09-50-38
In this case, I would like to maintain the 5 most recent data entries per “name”. So the new_data row should be added to df, and the oldest row for “john” which was on “2020-05-10” should be popped off the back end.

What is the best and most efficient way to implement this?

You should really start using the df[:, name] indexing format, because that will break when DataFrames hits 1.0, which is happening soonish.

I think I would do the following

  1. Make the thing you add to the dataframe a named tuple, rather than a one-row data frame
  2. use this function
julia> function add_and_delete!(df, nt::NamedTuple)
       push!(df, nt)
       sort!(df, [:name, :day])
       ind = findfirst(t -> == personname, eachrow(df))
       delete!(df, ind)
       return df
1 Like

ok sounds good thank you very much. old habits die hard. but yeah obviously they’ll completely die if the code breaks so I’ll get on that.

Thanks again