Filter dataframe by date, e.g. yearmonth

Basically, I have a column of dates and a 3 month period I want to filter the dataframe down to. I have, e.g

yms = yearmonth.([Date(2019,4),Date(2019,5),Date(2019,6)])

and then I try something like

df2 = @where(df1, yearmonth.(:date_column) in yms)

This gives me some error about booleans. I’ve tried other syntaxes (not using dataframesmeta), AND’d greater/less thans, and so on, to not avail. Just a lot of opaque-to-me errors.

My next stop is to make a column of booleans to filter on, but before then I wanted to get some input. I’m finding the DateTime thing generally kind of awkward, with a lot of things like averaging or plotting durations requiring extra work.

Finally, I am using timeseries, but for these parts of the process I don’t have a homogeneous (single-type) array, which timeseries requires.

Thanks in advance.

Maybe this?

julia>  using DataFrames, Dates

julia> dates = [Date(2019,4),Date(2019,5),Date(2019,6)];

julia>  yms = [yearmonth(d) for d in dates];

julia>  df = DataFrame(a = rand(4), b = [Date(2019,7), dates...]);

julia> df[in(yms).(yearmonth.(df.b)), :]
3Γ—2 DataFrame
β”‚ Row β”‚ a         β”‚ b          β”‚
β”‚     β”‚ Float64   β”‚ Date       β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 1   β”‚ 0.896528  β”‚ 2019-04-01 β”‚
β”‚ 2   β”‚ 0.711173  β”‚ 2019-05-01 β”‚
β”‚ 3   β”‚ 0.0872949 β”‚ 2019-06-01 β”‚

To explain (as I’ve been stumped by this and the documentation isn’t great), doing in(yms) creates a Fix2 function object, which can then be called with a single argument to check whether that argument is in yms, e.g.

check_yms = in(yms)
check_yms((2019,4)) # returns true

This Fix2 object is then broadcast over column df.b, which in itself has been transformed by a broadcasted call to yearmonth; the result of this gives you a BitArray that is your boolean mask for row selection:

julia> in(yms).(yearmonth.(df.b))
4-element BitArray{1}:
 0
 1
 1
 1

The only documentation Google could find on Fix2 is for the isequal function in Base, which can similarly be transformed to a one-argument version.

in also accepts two arguments so you might find this approach a bit simpler:

julia> x = rand(1:5, 10)
10-element Array{Int64,1}:
 2
 1
 3
 3
 2
 3
 3
 5
 4
 4

julia> in.(x, Ref([1,2,3]))
10-element BitArray{1}:
  true
  true
  true
  true
  true
  true
  true
 false
 false
 false

(note you have to protect the collection you are searching in from broadcasting)

1 Like

Is it worth adding this somewhere to the docs or your tutorial? Apologies if I’ve missed it, but this was a question I asked a while back myself and I feel like I’ve answered a few similar ones here and on Slack since.