Sometimes I struggle to think of a compact way of doing things. Well, most times.
I have a dataframe of election results for a given election year:
year state office district party votecount name 2012 AL representative 1 Republican 99000 Bob Smith
It’s got every state, office, district, party. I have these going back to 1970. There is one district for president or senator, but potentially many districts for representative. I keep year so that when I recombine or use the results elsewhere, the year is there. Soon, I’ll get the presidential votes broken down by district, too.
I want an output dataframe that has just one row per office per district that contains the maximum votegetter for that office/district. I couldn’t think of how to do it with the DataFramesMeta package.
So, I wrote a function:
function maxvoteperoffice(df) result = DataFrame([Int,String,Int,String,String,Int], [:Year, :State, :CD, :Office, :Party, :Votecount], 0) sort!(df,cols=(:State, :Office, :CD, :Votecount), rev = (false, false, false, true)) prevstate = "" prevoffice = "" prevcd = 0 for r in eachrow(df) if prevstate == r.State && prevoffice == r.Office && prevcd == r.CD continue else push!(result,[r.Year r.State r.CD r.Office r.Party r.Votecount]) prevstate = r.State prevoffice = r.Office prevcd = r.CD end end return result end
Not being very clever, I sorted the thing (fast–we’re talking 1500 to 2000 rows for the input df). Then, I grab the votecount in the first row for each Office/district/state. Then, I keep skipping until I reach a new Office/district/state.
It’s not horribly slow at .02 seconds. It’s not horrible to sort: any grouping strategy relies on sorting. But, it’s a bit clumsy and not at all general.
Is there a clever way using the database oriented functions/macros for dataframes? You don’t have to rewrite this–just point me in the right direction with the crucial one-liner or approach. I thought a pipe line of @linq operations would do it, but I couldn’t figure out how to get the maximum of a group.
I won’t die if this goes a-begging. The function works.