Filter dataframe by date, e.g. yearmonth

Basically, I have a column of dates and a 3 month period I want to filter the dataframe down to. I have, e.g

yms = yearmonth.([Date(2019,4),Date(2019,5),Date(2019,6)])

and then I try something like

df2 = @where(df1, yearmonth.(:date_column) in yms)

This gives me some error about booleans. I’ve tried other syntaxes (not using dataframesmeta), AND’d greater/less thans, and so on, to not avail. Just a lot of opaque-to-me errors.

My next stop is to make a column of booleans to filter on, but before then I wanted to get some input. I’m finding the DateTime thing generally kind of awkward, with a lot of things like averaging or plotting durations requiring extra work.

Finally, I am using timeseries, but for these parts of the process I don’t have a homogeneous (single-type) array, which timeseries requires.

Thanks in advance.

Maybe this?

julia>  using DataFrames, Dates

julia> dates = [Date(2019,4),Date(2019,5),Date(2019,6)];

julia>  yms = [yearmonth(d) for d in dates];

julia>  df = DataFrame(a = rand(4), b = [Date(2019,7), dates...]);

julia> df[in(yms).(yearmonth.(df.b)), :]
3Γ—2 DataFrame
β”‚ Row β”‚ a         β”‚ b          β”‚
β”‚     β”‚ Float64   β”‚ Date       β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 1   β”‚ 0.896528  β”‚ 2019-04-01 β”‚
β”‚ 2   β”‚ 0.711173  β”‚ 2019-05-01 β”‚
β”‚ 3   β”‚ 0.0872949 β”‚ 2019-06-01 β”‚

To explain (as I’ve been stumped by this and the documentation isn’t great), doing in(yms) creates a Fix2 function object, which can then be called with a single argument to check whether that argument is in yms, e.g.

check_yms = in(yms)
check_yms((2019,4)) # returns true

This Fix2 object is then broadcast over column df.b, which in itself has been transformed by a broadcasted call to yearmonth; the result of this gives you a BitArray that is your boolean mask for row selection:

julia> in(yms).(yearmonth.(df.b))
4-element BitArray{1}:
 0
 1
 1
 1

The only documentation Google could find on Fix2 is for the isequal function in Base, which can similarly be transformed to a one-argument version.

in also accepts two arguments so you might find this approach a bit simpler:

julia> x = rand(1:5, 10)
10-element Array{Int64,1}:
 2
 1
 3
 3
 2
 3
 3
 5
 4
 4

julia> in.(x, Ref([1,2,3]))
10-element BitArray{1}:
  true
  true
  true
  true
  true
  true
  true
 false
 false
 false

(note you have to protect the collection you are searching in from broadcasting)

Is it worth adding this somewhere to the docs or your tutorial? Apologies if I’ve missed it, but this was a question I asked a while back myself and I feel like I’ve answered a few similar ones here and on Slack since.