Hi all,
I’ve created a DataFrame as seen below:
using DataFrames # package dependency
original_df = DataFrame(; Year=Int[], Name=String[],
Surname=String[], Rating=Int[], ValueOne=Float64[], ValueTwo=Float64[])
push!(original_df, (1, "Dick", "Jones", 1, 2.3, 2.9))
push!(original_df, (1, "Dick", "Grayson", 1, 3.1, 6.9))
push!(original_df, (1, "Dick", "Grayson", 3, 2.4, 6.9))
push!(original_df, (2, "Dick", "Grayson", 3, 7.4, 5.3))
push!(original_df, (2, "Bob", "Jeff", 3, 4.83, 6.29))
push!(original_df, (1, "Bob", "Jeff", 1, 3.3, 1.9))
push!(original_df, (2, "Bob", "Jeff", 3, 10.2, 1.01))
push!(original_df, (2, "Bob", "Man", 3, 10.9, 0.03))
The original_df
DataFrame should look like this:
I want to perform an operation on original_df
such that it will average the :ValueOne
and :ValueTwo
columns (separately) when the following columns share the same value: [:Year, :Name, :Rating]
. The resulting DataFrame should look like this if my mental math is correct:
using DataFrames # package dependency
resulting_df = DataFrame(; Year=Int[], Name=String[], Rating=Int[],
ValueOne=Float64[], ValueTwo=Float64[])
push!(resulting_df, (1, "Dick", 1, 2.7, 4.9))
push!(resulting_df, (1, "Dick", 3, 2.4, 6.9))
push!(resulting_df, (2, "Dick", 3, 7.4, 5.3))
push!(resulting_df, (2, "Bob", 3, 8.643, 2.443))
push!(resulting_df, (1, "Bob", 1, 3.3, 1.9))
The resulting_df
DataFrame should look like this:
Note that I don’t care what the sorting of the rows in resulting_df
looks like; I just care that it has those rows.
Would anybody be able to instruct me in how the following operation can be done please? The actual DataFrame that I’m going to do this on will have 15+ columns and there’ll be 10 that need to match for the averaging to be done, so preferably if I had a way to do the operation where I didn’t have to type in multiple conditions separated by the &&
operator then that would be good please. Regardless, the code examples above should be a decent starting point I think.
Thank you in advance.