I think everyone agrees that managing โmissingโ is always a delicate matter and to be seen on a case by case basis.
For example, if there is no month in my data that is missing all its days, the following simple scheme does the trick (*).
combine(groupby(dfymd, [:y,:m]), :d => last)
Providing an extra row and column that informs me that there are missing days
dt=unique(rand(Date(2021, 1, 1):Day(1):Date(2024, 12, 31),900))
dtf=[d in dt ? d : missing for d in Date(2021, 1, 1):Day(1):Date(2024, 12, 31)]
dff=DataFrame(;dtf)
tr(d)=[year(d), month(d),day(d)]
tr(d::Missing)=[missing,missing,missing]
dfymd=transform(dff, :dtf => (ByRow(tr)=>[:y,:m,:d]))
combine(groupby(dfymd, [:y,:m]), :d => last)
I put it in the following form so that it is better readable
julia> unstack(combine(groupby(dfymd, [:y,:m]), :d => last => :d),:y,:d,allowmissing=true)
13ร6 DataFrame
Row โ m 2021 2022 2023 2024 missing
โ Int64? Int64? Int64? Int64? Int64? Int64?
โโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
1 โ 1 29 26 29 31 missing
2 โ 2 25 28 24 28 missing
3 โ 3 31 29 31 31 missing
4 โ 4 27 30 30 29 missing
5 โ 5 30 31 27 30 missing
6 โ 6 28 30 30 30 missing
7 โ 7 27 21 31 29 missing
8 โ 8 30 29 31 30 missing
9 โ 9 26 28 30 30 missing
10 โ 10 30 30 26 31 missing
11 โ 11 30 28 30 30 missing
12 โ 12 30 28 30 30 missing
13 โ missing missing missing missing missing missing
If there is a month with all the days missing, things change, but not much it seems.
dtfam=copy(dtf)
dtfam[1:31].=missing
dffam=DataFrame(;dtfam)
dfymd=transform(dffam, :dtfam => (ByRow(tr)=>[:y,:m,:d]))
unstack(combine(groupby(dfymd, [:y,:m]), :d => last => :d),:y,:d,allowmissing=true)
julia> unstack(combine(groupby(dfymd, [:y,:m]), :d => last => :d),:y,:d,allowmissing=true)
13ร6 DataFrame
Row โ m 2021 2022 2023 2024 missing
โ Int64? Int64? Int64? Int64? Int64? Int64?
โโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
1 โ 2 28 27 27 26 missing
2 โ 3 31 31 31 31 missing
3 โ 4 30 25 27 29 missing
4 โ 5 31 28 30 30 missing
5 โ 6 28 27 28 30 missing
6 โ 7 29 24 31 22 missing
7 โ 8 31 31 31 30 missing
8 โ 9 30 29 26 29 missing
9 โ 10 30 30 29 30 missing
10 โ 11 29 27 29 29 missing
11 โ 12 31 22 31 31 missing
12 โ 1 missing 23 31 31 missing
13 โ missing missing missing missing missing missing
So I wonder what your real situation is.
PS
I also wonder why in this case the month of January 2021 (with all missing) was put at the end of the group
(*) The reason is that the groupby function implicitly filters out missing data, assuming that if there is a missing day, the month and year of that day are also missing.