Let’s say you are working with tabular objects and for convenience and want to use t[i, j]
getindex
ing (and not too many other functions) throughout your code.
On the other hand you want to support the Tables.jl interface. Therefore when you receive a function argument t
, you wrap it in Tables.columntable(t)
.
As far as I can tell, the Tables interface does not support indexing. But many implementations of Tables.columntable
do support that, for instance Tables.columntable(df::DataFrame) = df
.
Therefore it makes sense to do the following in your code
struct IndexableColumnTable{T}
table::T
end
function Base.getindex(t::IndexableColumnTable)
p = propertynames(t.table)[j]
getproperty(t.table, p)[i]
end
Then in your function do
function foo(t)
t = columntable(t)
if hasmethod(getindex, (typeof(t), Int, Int))
return t
else
return IndexableColumnTable(t)
end
end
Is this a reasonable approach?
I’m afraid this isn’t right. First, I think you are confusing Tables.columntable
and Tables.columns
: the former always returns a named tuple of vectors, so if you do t = columntable(t)
you never get an object which can be indexed with two integers (even if the input was e.g. a DataFrame
).
Second, supposing you did t = Tables.columns(t)
, the Tables.jl interface does not guaranty that getproperty(t.table, p)
returns a vector, only an iterator. So I think you have to check whether the returned iterators are AbstractVector
, and if not call collect
on them. Then you can store the result as a named tuple of vectors (which is the most basic type of table) or use Tables.materializer(t)(cols)
to create a table object of the same type as t
.
Maybe Tables.jl could provide a function to do this more easily. Feel free to file an issue.
Thanks for the answer. It’s probably best just to use Tables.matrix
in this context, even if it copies.
I get why Tables would want to be agnostic about the layout of columns, allowing for iterators or infinite streaming.