I have been experimenting with different NLopt optimizers on particular examples of models from MixedModels. The current set of results is available at
For each id I want to determine the minimum objective attained over all the optimizers and subtract that from the obj to determine the :excess. I have the feeling that I should be able to do that one pass to obtain the minimum and one pass to do the join and the subtraction. But I always seem to end up with 3 passes. For example,
Stuff gets loaded via CSVFiles.jl. Note that Iβm not materializing this into something like a DataFrame, instead the query will run directly off the data from the file.
Now group things by the content of the id column. What this does is pass on a sequence of groups. Each group is like an array (all the rows that belong to that group), plus the key that belongs to the group (i.e. the value by which things were grouped).
We are now creating a new row/named tuple with the {} syntax. The _ refers to the current element, i.e. it will always refer to a group. First we want to extract the minimum value for the obj column for each group. The syntax _..obj will extract the obj column from the current group (remember, a group is a set of rows, so they have columns). We then pass all the rows of the current group into the rows column of the named tuple we create. So this step in the pipeline will pass on a table, where one column will actually have little tables in each cell.
This row βflattensβ the nested list. The input here is still one element per group. The first argument to @mapmany tells Query.jl how to go from one of the input elements, to n-output elements. In our case, we return all the rows that we had previously stored. So now we are back to a representation where we have one element for each element that we originally started with, i.e. we have undone the grouping. The second argument to @mapmany creates a named tuple/row. We can reference the input element by _, and the current element that we get by iterating the result of the first argument by __.
Materializes the whole thing into a DataFrame. No need to do that, actually, you could also materialize it into an IndexedTables.jl, or save it to disc right away with say save("foo.csv") or save("foo.feather").
If you are using JuliaDB tables, the JuliaDBMeta way of doing this would be:
@transform_vec res :id {excess = :obj .- minimum(:obj)}
To break it down, @transform_vec adds or modifies columns (and works vectorially, i.e. it is a column-wise macro). The second optional argument is for grouping (meaning it will group by :id and apply the operation separately to each group). The result is still grouped, to have it as just one table you can do either:
@transform_vec res :id flatten=true {excess = :obj .- minimum(:obj)}
or call flatten on the result. You can use @apply to do this operation and filtering in a pipeline (and move the grouping in the @apply call, so that you can filter in each group before flattening):
@apply res :id flatten=true begin
@transform_vec {excess = :obj .- minimum(:obj)}
@where :excess > 0.2
end