Table transformation in JuliaDB or Queryverse or DataFramesMeta?


I have been experimenting with different NLopt optimizers on particular examples of models from MixedModels. The current set of results is available at

For each id I want to determine the minimum objective attained over all the optimizers and subtract that from the obj to determine the :excess. I have the feeling that I should be able to do that one pass to obtain the minimum and one pass to do the join and the subtraction. But I always seem to end up with 3 passes. For example,

julia> using JuliaDB

julia> res = loadtable("/home/bates/git/MixedModelExamples/results/results.csv",indexcols=[1,2]);

julia> resaug = join(res, groupby(minimum, res, :id, select=:obj), how=:left, lkey = :id);

julia> resaug = pushcol(resaug, :excess, map(r -> r.obj - r.minimum, resaug));

julia> filter(r -> r.excess > 0.02, resaug)
Table with 15 rows, 8 columns:
id    opt              code               nfeval  obj        elapsed      minimum    excess
"A0"  "LN_NELDERMEAD"  "FTOL_REACHED"     28025   1031.93    12.2112      1030.96    0.970225
"A0"  "LN_PRAXIS"      "FTOL_REACHED"     549     1049.15    0.243573     1030.96    18.1959
"A3"  "LN_NELDERMEAD"  "MAXEVAL_REACHED"  100000  28675.1    137.02       28586.3    88.7686
"A3"  "LN_PRAXIS"      "FTOL_REACHED"     2883    28617.8    3.81868      28586.3    31.4375
"A3"  "LN_SBPLX"       "FTOL_REACHED"     16869   28591.3    22.9396      28586.3    4.96359
"A4"  "LN_NELDERMEAD"  "FTOL_REACHED"     11367   28674.3    15.3279      28670.9    3.41974
"A4"  "LN_PRAXIS"      "FTOL_REACHED"     503     28695.1    0.689952     28670.9    24.206
"AP"  "LN_NELDERMEAD"  "FTOL_REACHED"     3439    8.85058e5  88.9352      8.84958e5  100.191
"Ab"  "LN_COBYLA"      "FTOL_REACHED"     5       327.498    0.000166279  327.327    0.17135
"Ae"  "LN_NELDERMEAD"  "FTOL_REACHED"     68      14773.3    0.00268521   14773.2    0.0401451
"An"  "LN_COBYLA"      "FTOL_REACHED"     1279    7927.57    0.327538     7926.68    0.895492
"An"  "LN_NELDERMEAD"  "FTOL_REACHED"     44466   7927.5     10.2801      7926.68    0.822917
"An"  "LN_PRAXIS"      "FTOL_REACHED"     1984    7926.84    0.465446     7926.68    0.168768
"An"  "LN_SBPLX"       "FTOL_REACHED"     2384    7927.5     0.544271     7926.68    0.821954
"Ar"  "LN_NELDERMEAD"  "FTOL_REACHED"     44      -257.409   0.00131563   -260.628   3.21928

Is there a better way to do this operation in JuliaDB or in Queryverse or using DataFramesMeta?


If I understand what you want correctly this how you can do it with DataFramesMeta:

@transform(groupby(df, :id), excess = :obj - minimum(:obj))

And to produce the table you present you can use:

@linq df |>
    groupby(:id) |>
    transform(excess = :obj - minimum(:obj)) |>
    where(:excess .> 0.02)

EDIT: this assumes that you have loaded your data into df which is a DataFrame.


I think this does what you want:

using Queryverse

load("results.csv") |> # Step 1
@groupby( |> # Step 2
@map({best_obj=minimum(_..obj), rows=_}) |> # Step 3
@mapmany(_.rows, {, __.opt, __.code, __.nfeval, __.obj, __.elapsed, excess=__.obj - _.best_obj}) |> # Step 4
DataFrame # Step 5

Here are some explanations for each step:

  1. Stuff gets loaded via CSVFiles.jl. Note that I’m not materializing this into something like a DataFrame, instead the query will run directly off the data from the file.
  2. Now group things by the content of the id column. What this does is pass on a sequence of groups. Each group is like an array (all the rows that belong to that group), plus the key that belongs to the group (i.e. the value by which things were grouped).
  3. We are now creating a new row/named tuple with the {} syntax. The _ refers to the current element, i.e. it will always refer to a group. First we want to extract the minimum value for the obj column for each group. The syntax _..obj will extract the obj column from the current group (remember, a group is a set of rows, so they have columns). We then pass all the rows of the current group into the rows column of the named tuple we create. So this step in the pipeline will pass on a table, where one column will actually have little tables in each cell.
  4. This row β€œflattens” the nested list. The input here is still one element per group. The first argument to @mapmany tells Query.jl how to go from one of the input elements, to n-output elements. In our case, we return all the rows that we had previously stored. So now we are back to a representation where we have one element for each element that we originally started with, i.e. we have undone the grouping. The second argument to @mapmany creates a named tuple/row. We can reference the input element by _, and the current element that we get by iterating the result of the first argument by __.
  5. Materializes the whole thing into a DataFrame. No need to do that, actually, you could also materialize it into an IndexedTables.jl, or save it to disc right away with say save("foo.csv") or save("foo.feather").


Thank you both for the explanations.


Wait, does all that code just do the same as @bkamins DataFramesMeta code just above?


If you are using JuliaDB tables, the JuliaDBMeta way of doing this would be:

@transform_vec res :id {excess = :obj .- minimum(:obj)}

To break it down, @transform_vec adds or modifies columns (and works vectorially, i.e. it is a column-wise macro). The second optional argument is for grouping (meaning it will group by :id and apply the operation separately to each group). The result is still grouped, to have it as just one table you can do either:

@transform_vec res :id flatten=true {excess = :obj .- minimum(:obj)}

or call flatten on the result. You can use @apply to do this operation and filtering in a pipeline (and move the grouping in the @apply call, so that you can filter in each group before flattening):

@apply res :id flatten=true begin
   @transform_vec {excess = :obj .- minimum(:obj)}
   @where :excess > 0.2