In MLJ.jl, we are trying to write a data-agnostic machine-learning framework and are presently trying to make this work using Query, to which I am new.
We would like to write methods that take an iterable table as input and that output a table of identical type (assuming the type is sink-supported). For example, a function to standardise the numerical features (columns) of some table, or a function to project the table onto a smaller training set of rows. If I give my method a DataFrame
, I want a DataFrame
as output. If I give it a TypedTable
, then the output should be a TypedTable
.
Here’s my attempt at a function to select a subset of columns from some Query utterable table, returning an object of the same type:
function getcols(X::T, c::AbstractArray{I}) where {T,I<:Union{Symbol,Integer}}
TableTraits.isiterabletable(X) || error("Argument is not an iterable table.")
row_iterator = @from row in X begin
@select project(row, c)
@collect T
end
end
Here project(row, c)
is just the projection of the named tuple row
onto a named tuple with only those labels/indices specified by c
.
Now getcols
works as for a DataFrame
but not for, say a TypedTable
. The problem is that TypedTable
has type parameters which will be different for the output than the input. In other words, T
in this case is TypedTables.Table{NamedTuple{(:x1, :x2........
for the signature, but T
in the collect statement needs to be just TypedTable
to work (I guess).
So what’s a way to do this that works?