I have a table with three keys: NR=Int64, DT1=Date, DT2=Date, and DT2 is always before DT1.
I want to filter
the table with two conditions:
- DT1=Date(2017, 3, 31)
- choose the newest by DT2 (i.e. closest to DT1) of possibly multiple entries that have the same NR and DT1.
Very similar to a question on StackExchange, just that I want NR to be the group-by variable.
While this is easily formulated as a SQL query, my implementation in JuliaDB is complicated and not exaclty fast.
I filter the correct date, then figure out the correct DT2 per NR, cast it into a Dictionary, and finally filter the entries by the correct DT2:
select_date = filter(d -> d == Date(2017, 3, 31), data_table, select = :DT1)
correct_DT2 = collect(groupby(@NT(DT2=d -> maximum(d)), select_date , :NR, select=:DT2))
dict = Dict{Int64, Date}()
for i in 1:length(correct_DT2)
dict[correct_DT2[i].NR] = correct_DT2[i].DT2
end
filter(row -> row.DT2 == dict[row.NR], select_date, select = (:NR, :DT2))
It seems to me that the filter function has less functionality than SQL, for example the GROUP BY
qualifier.
Any thought on how to do it better?