TBulka
1
Hello everyone,
say, I construct a data frame like this:
using Dates
using DataFrames
datevec = collect(Date(2020, 02, 22):Day(1):Date(2020, 03, 03))
valuevec = ["A", "B", "A", "A", "C", "C", "A", "C", "A", "A", "B"]
df = DataFrame(d=datevec, v=valuevec)
What I would like to do now is to count how often “A”, “B” and “C” occur each month. Is there any elegant way to do this?
Please ask, if you need additional information.
Thanks very much for your help in advance!
Thomas
sijo
2
Here’s a solution:
# Add a column for the month
df.m = month.(df.d)
combine(groupby(df, [:m, :v]), nrow)
or without changing the original dataframe:
df2 = transform(df, :d => ByRow(month) => :m)
combine(groupby(df2, [:m, :v]), nrow)
Result:
5×3 DataFrame
Row │ m v nrow
│ Int64 String Int64
─────┼──────────────────────
1 │ 2 A 4
2 │ 2 B 1
3 │ 2 C 3
4 │ 3 A 2
5 │ 3 B 1
1 Like
If you want to preserve dates, you can use trunc
instead of month
julia> df2 = transform(df, :d => (x -> trunc.(x, Dates.Month)) => :m);
julia> combine(groupby(df2, [:m, :v]), nrow)
5×3 DataFrame
Row │ m v nrow
│ Date String Int64
─────┼───────────────────────────
1 │ 2020-02-01 A 4
2 │ 2020-02-01 B 1
3 │ 2020-02-01 C 3
4 │ 2020-03-01 A 2
5 │ 2020-03-01 B 1
1 Like
TBulka
4
Hello everyone and thanks for your advice. It now works as intented!