Counting Value Occurences in Time Range in DataFrames

Hello everyone,

say, I construct a data frame like this:

using Dates
using DataFrames

datevec = collect(Date(2020, 02, 22):Day(1):Date(2020, 03, 03))
valuevec = ["A", "B", "A", "A", "C", "C", "A", "C", "A", "A", "B"]
df = DataFrame(d=datevec, v=valuevec)

What I would like to do now is to count how often “A”, “B” and “C” occur each month. Is there any elegant way to do this?

Please ask, if you need additional information.

Thanks very much for your help in advance!

Thomas

Here’s a solution:

# Add a column for the month
df.m = month.(df.d)

combine(groupby(df, [:m, :v]), nrow)

or without changing the original dataframe:

df2 = transform(df, :d => ByRow(month) => :m)

combine(groupby(df2, [:m, :v]), nrow)

Result:

5×3 DataFrame
 Row │ m      v       nrow  
     │ Int64  String  Int64 
─────┼──────────────────────
   1 │     2  A           4
   2 │     2  B           1
   3 │     2  C           3
   4 │     3  A           2
   5 │     3  B           1
1 Like

If you want to preserve dates, you can use trunc instead of month

julia> df2 = transform(df, :d => (x -> trunc.(x, Dates.Month)) => :m);

julia> combine(groupby(df2, [:m, :v]), nrow)
5×3 DataFrame
 Row │ m           v       nrow
     │ Date        String  Int64
─────┼───────────────────────────
   1 │ 2020-02-01  A           4
   2 │ 2020-02-01  B           1
   3 │ 2020-02-01  C           3
   4 │ 2020-03-01  A           2
   5 │ 2020-03-01  B           1
1 Like

Hello everyone and thanks for your advice. It now works as intented!