JuliaDB filter: select one of multiple rows


#1

I have a table with three keys: NR=Int64, DT1=Date, DT2=Date, and DT2 is always before DT1.

I want to filter the table with two conditions:

  • DT1=Date(2017, 3, 31)
  • choose the newest by DT2 (i.e. closest to DT1) of possibly multiple entries that have the same NR and DT1.

Very similar to a question on StackExchange, just that I want NR to be the group-by variable.

While this is easily formulated as a SQL query, my implementation in JuliaDB is complicated and not exaclty fast.
I filter the correct date, then figure out the correct DT2 per NR, cast it into a Dictionary, and finally filter the entries by the correct DT2:

select_date = filter(d -> d == Date(2017, 3, 31), data_table, select = :DT1)
correct_DT2 = collect(groupby(@NT(DT2=d -> maximum(d)), select_date , :NR, select=:DT2))

dict = Dict{Int64, Date}()
for i in 1:length(correct_DT2)
   dict[correct_DT2[i].NR] = correct_DT2[i].DT2
end

filter(row -> row.DT2 == dict[row.NR], select_date, select = (:NR, :DT2))

It seems to me that the filter function has less functionality than SQL, for example the GROUP BY qualifier.

Any thought on how to do it better?