Hello, everyone
I am new to Julia and I’m trying to carry out the task of grouping a variable by year and month and then summing for each month and year along the duration of a time-index variable.
I know how to do this in R and I provide a small reproducible example bellow. In a nutshell, the variable x
is indexed by date. With that, I can get monthly sums of x
for the years 2021 and 2022 by grouping my data by year
and month
. Helped by dplyr
and lubridate
, I can then easily sum all x
in that period.
How would I do that in Julia? While I don’t expect someone to fully reproduce this example, I’d appreciate some directions as to what packages I should look into? From what I’ve gathered, it’s basically either Queryverse
and/or DataFramesMeta
.
# Libraries
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#>
#> date, intersect, setdiff, union
# Generate fake data
x = runif(2*365,0, 50)
# Generate a sequence of dates
my_seq = seq(as.Date("2021/1/1"), by = "day", length.out = length(x))
# Set a dataframe
my_data = tibble(my_seq, x)
# Use tidyr to group data points by month and sum them for each
# month.
monthly_x =
my_data |>
mutate(year = year(my_seq),
month = month(my_seq)) |>
group_by(year, month) |>
summarise(sum_of_x = sum(x))
#> `summarise()` has grouped output by 'year'. You can override using the
#> `.groups` argument.
# Check January 2021
sum(x[1:31]) == monthly_x$sum_of_x[1]
#> [1] TRUE
#
#Created on 2022-04-26 by the reprex package (v2.0.1)