How would I update a DataFrame with new data and delete old data so I can maintain the N most recent rows per person

Julia1 · July 1, 2020, 1:49pm

Hi how’s it going? Here is an example of the dataframe, and a new data row to be added.

df = DataFrame(:name=>["john","john","john","john","john","mike","mike","mike","mike","mike"]
,:day=>["2020-05-14","2020-05-13","2020-05-12","2020-05-11","2020-05-10","2020-05-14","2020-05-13","2020-05-12","2020-05-11","2020-05-10"],
:earnings=>[20,15,17,32,15,87,65,80,56,90])
df[:day] = DateTime.(df[:day])


new_data = DataFrame(:name=>["john"],:day=>["2020-05-15"],:earnings=>[18])
new_data[:day] = DateTime.(new_data[:day])

heres a screenshot for a better view of the data:
Screenshot from 2020-07-01 09-49-37

Screenshot from 2020-07-01 09-50-38
In this case, I would like to maintain the 5 most recent data entries per “name”. So the new_data row should be added to df, and the oldest row for “john” which was on “2020-05-10” should be popped off the back end.

What is the best and most efficient way to implement this?

pdeffebach · July 1, 2020, 2:17pm

You should really start using the df[:, name] indexing format, because that will break when DataFrames hits 1.0, which is happening soonish.

I think I would do the following

Make the thing you add to the dataframe a named tuple, rather than a one-row data frame
use this function

julia> function add_and_delete!(df, nt::NamedTuple)
       push!(df, nt)
       sort!(df, [:name, :day])
       ind = findfirst(t -> t.name == personname, eachrow(df))
       delete!(df, ind)
       return df
       end

Julia1 · July 1, 2020, 2:26pm

ok sounds good thank you very much. old habits die hard. but yeah obviously they’ll completely die if the code breaks so I’ll get on that.

Thanks again

Topic		Replies	Views
How would I sort user transactions by date, and only keep the N most recent entries General Usage question	8	909	February 6, 2020
Add incoming data to a DataFrame New to Julia	5	884	August 21, 2019
Adding a new row to a DataFrame General Usage dataframes	8	34114	November 15, 2021
Sequentially add data to a DataFrame New to Julia question , dataframes	4	779	January 9, 2025
Appending rows to a dataframe is seemingly inconsistent and confusing Data	11	4662	December 24, 2021

How would I update a DataFrame with new data and delete old data so I can maintain the N most recent rows per person

Related topics