DataFrame, aggregate by month of date

I am sorry for the easy question.
I have DataFrame, one the column is Date and Count:: Int.
What is the easiest way to calculate the sum(Count) per month?

thank you.

P.S. I have done by loop, but I fill there has to be a better way.

thank you in advance.

1 Like

using DataFrames
using Dates
dates = Date("2015-01-01") .+ Day.(1:1000)
items = (1:1000) .% 7
df = DataFrame(:dt => dates, :count => items)
df[!, :month] = month.(df[!, :dt])
by(df, :month, :count=>sum)

Hope this helps!

5 Likes

I think the best way is to use Query.jl

using DataFrames, Query

df = DataFrame(name=["John", "Sally", "Kirk", "Sanders"], age=[23., 42., 59., 31.], children=[3,2,2,0], date=[Date("2015-01-01"),Date("2015-01-10"),Date("2015-02-20"), Date("2015-02-05")])

x = df |>
    @groupby(Dates.format(_.date, "yyyy-mm")) |>
    @map({Key=key(_), Count=length(_)}) |>
    DataFrame

println(x)
2Γ—2 DataFrame
β”‚ Row β”‚ Key     β”‚ Count β”‚
β”‚     β”‚ String  β”‚ Int64 β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 1   β”‚ 2015-01 β”‚ 2     β”‚
β”‚ 2   β”‚ 2015-02 β”‚ 2     β”‚

I am brand new to Julia so I apologize if the format of the post is not exactly as it should be. I had the same question and found this post in my search for a solution. I took the above and modified it slightly with a LINQ approach. I wanted the outputted DataFrame to have the month still as a date to use in a time series. Other languages I use have a function for such a return. It is likely not elegant but worked. It returns y-m-d format with the first of each month. Could not get it to y-m. But it does aggregate on month. I needed the mean of a column grouped by month. Any aggregation function should work I would think. I did modify the example data frame above by adding a few more rows.

df_2 = DataFrame(name=[β€œJohn”, β€œSally”, β€œKirk”, β€œSanders”, β€œHank”], age=[23., 42., 59., 31., 65], children=[3,2,2,0, 5], date=[Date(β€œ2015-01-01”),Date(β€œ2015-01-10”),Date(β€œ2015-02-20”), Date(β€œ2015-02-05”), Date(β€œ2015-02-28”)])

df_3 = @from i in df_2 begin

   @group i by Date.(Dates.format(i.date, "yyyy-mm")) into g

   @select{Month=key(g), Mean_age = mean(g.age) }

   @collect DataFrame

end

Here’s a solution using just DataFrames:

using DataFrames, Dates, Statistics

df_2 = DataFrame(name=["John", "Sally", "Kirk", "Sanders", "Hank"],
                 age=[23., 42., 59., 31., 65],
                 children=[3,2,2,0,5],
                 date=[Date("2015-01-01")
                       Date("2015-01-10")
                       Date("2015-02-20")
                       Date("2015-02-05")
                       Date("2015-02-28")])

df_3 = transform(df_2, :date => ByRow(yearmonth) => :Month)
df_4 = combine(groupby(df_3, :Month), :age => mean => :Mean_age)

# Result:
2Γ—2 DataFrame
 Row β”‚ Month      Mean_age 
     β”‚ Tuple…     Float64  
─────┼─────────────────────
   1 β”‚ (2015, 1)   32.5
   2 β”‚ (2015, 2)   51.6667

The same with syntax sugar from DataFramesMeta:

using DataFramesMeta

@chain df_2 begin
    @rtransform :Month = yearmonth(:date)
    groupby(:Month)
    @combine :Mean_age = mean(:age)
end

and if you want the month as a β€œyear-month” string:

@chain df_2 begin
    @rtransform :Month = join(yearmonth(:date), '-')
    groupby(:Month)
    @combine :Mean_age = mean(:age)
end

# Result:
2Γ—2 DataFrame
 Row β”‚ Month   Mean_age 
     β”‚ String  Float64  
─────┼──────────────────
   1 β”‚ 2015-1   32.5
   2 β”‚ 2015-2   51.6667
5 Likes