Table transformation in JuliaDB or Queryverse or DataFramesMeta?

dmbates · June 12, 2018, 8:05pm

I have been experimenting with different NLopt optimizers on particular examples of models from MixedModels. The current set of results is available at

https://github.com/dmbates/MixedModelExamples/blob/master/results/results.csv

For each id I want to determine the minimum objective attained over all the optimizers and subtract that from the obj to determine the :excess. I have the feeling that I should be able to do that one pass to obtain the minimum and one pass to do the join and the subtraction. But I always seem to end up with 3 passes. For example,

julia> using JuliaDB

julia> res = loadtable("/home/bates/git/MixedModelExamples/results/results.csv",indexcols=[1,2]);

julia> resaug = join(res, groupby(minimum, res, :id, select=:obj), how=:left, lkey = :id);

julia> resaug = pushcol(resaug, :excess, map(r -> r.obj - r.minimum, resaug));

julia> filter(r -> r.excess > 0.02, resaug)
Table with 15 rows, 8 columns:
id    opt              code               nfeval  obj        elapsed      minimum    excess
──────────────────────────────────────────────────────────────────────────────────────────────
"A0"  "LN_NELDERMEAD"  "FTOL_REACHED"     28025   1031.93    12.2112      1030.96    0.970225
"A0"  "LN_PRAXIS"      "FTOL_REACHED"     549     1049.15    0.243573     1030.96    18.1959
"A3"  "LN_NELDERMEAD"  "MAXEVAL_REACHED"  100000  28675.1    137.02       28586.3    88.7686
"A3"  "LN_PRAXIS"      "FTOL_REACHED"     2883    28617.8    3.81868      28586.3    31.4375
"A3"  "LN_SBPLX"       "FTOL_REACHED"     16869   28591.3    22.9396      28586.3    4.96359
"A4"  "LN_NELDERMEAD"  "FTOL_REACHED"     11367   28674.3    15.3279      28670.9    3.41974
"A4"  "LN_PRAXIS"      "FTOL_REACHED"     503     28695.1    0.689952     28670.9    24.206
"AP"  "LN_NELDERMEAD"  "FTOL_REACHED"     3439    8.85058e5  88.9352      8.84958e5  100.191
"Ab"  "LN_COBYLA"      "FTOL_REACHED"     5       327.498    0.000166279  327.327    0.17135
"Ae"  "LN_NELDERMEAD"  "FTOL_REACHED"     68      14773.3    0.00268521   14773.2    0.0401451
"An"  "LN_COBYLA"      "FTOL_REACHED"     1279    7927.57    0.327538     7926.68    0.895492
"An"  "LN_NELDERMEAD"  "FTOL_REACHED"     44466   7927.5     10.2801      7926.68    0.822917
"An"  "LN_PRAXIS"      "FTOL_REACHED"     1984    7926.84    0.465446     7926.68    0.168768
"An"  "LN_SBPLX"       "FTOL_REACHED"     2384    7927.5     0.544271     7926.68    0.821954
"Ar"  "LN_NELDERMEAD"  "FTOL_REACHED"     44      -257.409   0.00131563   -260.628   3.21928

Is there a better way to do this operation in JuliaDB or in Queryverse or using DataFramesMeta?

bkamins · June 12, 2018, 8:50pm

If I understand what you want correctly this how you can do it with DataFramesMeta:

@transform(groupby(df, :id), excess = :obj - minimum(:obj))

And to produce the table you present you can use:

@linq df |>
    groupby(:id) |>
    transform(excess = :obj - minimum(:obj)) |>
    where(:excess .> 0.02)

EDIT: this assumes that you have loaded your data into df which is a DataFrame.

davidanthoff · June 12, 2018, 8:57pm

I think this does what you want:

using Queryverse

load("results.csv") |> # Step 1
@groupby(_.id) |> # Step 2
@map({best_obj=minimum(_..obj), rows=_}) |> # Step 3
@mapmany(_.rows, {__.id, __.opt, __.code, __.nfeval, __.obj, __.elapsed, excess=__.obj - _.best_obj}) |> # Step 4
DataFrame # Step 5

Here are some explanations for each step:

Stuff gets loaded via CSVFiles.jl. Note that I’m not materializing this into something like a DataFrame, instead the query will run directly off the data from the file.
Now group things by the content of the id column. What this does is pass on a sequence of groups. Each group is like an array (all the rows that belong to that group), plus the key that belongs to the group (i.e. the value by which things were grouped).
We are now creating a new row/named tuple with the {} syntax. The _ refers to the current element, i.e. it will always refer to a group. First we want to extract the minimum value for the obj column for each group. The syntax _..obj will extract the obj column from the current group (remember, a group is a set of rows, so they have columns). We then pass all the rows of the current group into the rows column of the named tuple we create. So this step in the pipeline will pass on a table, where one column will actually have little tables in each cell.
This row “flattens” the nested list. The input here is still one element per group. The first argument to @mapmany tells Query.jl how to go from one of the input elements, to n-output elements. In our case, we return all the rows that we had previously stored. So now we are back to a representation where we have one element for each element that we originally started with, i.e. we have undone the grouping. The second argument to @mapmany creates a named tuple/row. We can reference the input element by _, and the current element that we get by iterating the result of the first argument by __.
Materializes the whole thing into a DataFrame. No need to do that, actually, you could also materialize it into an IndexedTables.jl, or save it to disc right away with say save("foo.csv") or save("foo.feather").

dmbates · June 12, 2018, 9:17pm

Thank you both for the explanations.

mkborregaard · June 13, 2018, 9:01am

Wait, does all that code just do the same as @bkamins DataFramesMeta code just above?

piever · June 13, 2018, 1:47pm

If you are using JuliaDB tables, the JuliaDBMeta way of doing this would be:

@transform_vec res :id {excess = :obj .- minimum(:obj)}

To break it down, @transform_vec adds or modifies columns (and works vectorially, i.e. it is a column-wise macro). The second optional argument is for grouping (meaning it will group by :id and apply the operation separately to each group). The result is still grouped, to have it as just one table you can do either:

@transform_vec res :id flatten=true {excess = :obj .- minimum(:obj)}

or call flatten on the result. You can use @apply to do this operation and filtering in a pipeline (and move the grouping in the @apply call, so that you can filter in each group before flattening):

@apply res :id flatten=true begin
   @transform_vec {excess = :obj .- minimum(:obj)}
   @where :excess > 0.2
end

Topic		Replies	Views
Query.jl v0.11 released Package Announcements announcement	7	1116	February 4, 2019
[ANN] JuliaDBMeta: metaprogramming tools to manipulate JuliaDB tables Community announcement	8	2310	August 19, 2018
Comparing DataFrames native API and Query Data	4	1525	September 1, 2017
Using NLopt for maximum likelihood estimation New to Julia	2	707	December 9, 2019
Packages for DataFrame manipulation/query Data data , dataframes	5	1020	July 4, 2018

Table transformation in JuliaDB or Queryverse or DataFramesMeta?

Related topics