I have a table with multiple entries.
For each entry I have the timestamp it was fetched and the timestamp it represents, e.g. forecasts for a specific timestamp computed at different times.
I would like to query the newest entry for a given vector of timestamps.
Is there a way to do this with Query.jl?
In SQL I would do something like:
SELECT MAX(t_fetch), * 
FROM data
WHERE date
BETWEEN t_from AND t_to
GROUP BY date ORDER BY date ASC
With the following i get all forecasts that where made for the given timestamp
df2 = df |> @query(d, begin
    @orderby d.date, descending(d.fetch)
    @where d.date == timestamp
    @select d
end) |> DataFrame
from this dataframe I would have to select the first element and do this for all timestamps in the given vector.
df3 = df2 |> @take(1) |> DataFrame
Can I combine this with two nested queries somehow?
             
            
              
              
              
            
            
           
          
            
            
              I have also tried:
df |> 
    @filter(_.date == time[1]) |>  
    @orderby_descending(_.fetch) |> 
    @take(1) |> 
    DataFrame
I would have to repeat this for all t in time and combine the result in a single DataFrame.
This is what I think I should do:
df |> 
    @groupby(_.date) |> 
    @orderby_descending(_.fetch) |> 
    @take(1) |> 
    @map(DataFrames.DataFrame(_)) |>
    collect |> first
However, I get the wrong result. I now have every fetch for the first date.
             
            
              
              
              
            
            
           
          
            
            
              in base DataFrames I think what you want is
ulia> df = DataFrame(a = [1, 1, 2, 2], b = rand(4))
4×2 DataFrame
│ Row │ a     │ b        │
│     │ Int64 │ Float64  │
├─────┼───────┼──────────┤
│ 1   │ 1     │ 0.319191 │
│ 2   │ 1     │ 0.400976 │
│ 3   │ 2     │ 0.919093 │
│ 4   │ 2     │ 0.507223 │
julia> sort!(df, [:a, :b])
4×2 DataFrame
│ Row │ a     │ b        │
│     │ Int64 │ Float64  │
├─────┼───────┼──────────┤
│ 1   │ 1     │ 0.319191 │
│ 2   │ 1     │ 0.400976 │
│ 3   │ 2     │ 0.507223 │
│ 4   │ 2     │ 0.919093 │
julia> combine(first, groupby(df, :a))
2×2 DataFrame
│ Row │ a     │ b        │
│     │ Int64 │ Float64  │
├─────┼───────┼──────────┤
│ 1   │ 1     │ 0.319191 │
│ 2   │ 2     │ 0.507223 │
But I’m not 100% sure thats what you want to do. Not sure how to do it with Query, sorry.
             
            
              
              
              
            
            
           
          
            
            
              I found a working solution
dfq = vcat([
    (
        df |>
        @filter(_.date == t) |>
        @orderby_descending(_.fetch) |>
        @take(1) |>
        DataFrame
    ) for t in time
]...)
but it is not really elegant.
             
            
              
              
              
            
            
           
          
            
            
              And with the solution from @pdeffebach
dfq2 =
    df |>
    x -> DataFrames.sort(x, ["fetch", "date"], rev = (true, false)) |>
    x -> DataFrames.combine(first, DataFrames.groupby(x, "date")) |>
    x -> Dataframes.filter("date" => x -> time[1] <= x <= time[end], x) |> 
    DataFrames.DataFrame