Defining `getindex` for `Tables.columntable` objects

Let’s say you are working with tabular objects and for convenience and want to use t[i, j] getindexing (and not too many other functions) throughout your code.

On the other hand you want to support the Tables.jl interface. Therefore when you receive a function argument t, you wrap it in Tables.columntable(t).

As far as I can tell, the Tables interface does not support indexing. But many implementations of Tables.columntable do support that, for instance Tables.columntable(df::DataFrame) = df.

Therefore it makes sense to do the following in your code

struct IndexableColumnTable{T}

function Base.getindex(t::IndexableColumnTable)
	p = propertynames(t.table)[j]
	getproperty(t.table, p)[i]

Then in your function do

function foo(t)
    t = columntable(t)
    if hasmethod(getindex, (typeof(t), Int, Int))
    	return t
    	return IndexableColumnTable(t)

Is this a reasonable approach?

I’m afraid this isn’t right. First, I think you are confusing Tables.columntable and Tables.columns: the former always returns a named tuple of vectors, so if you do t = columntable(t) you never get an object which can be indexed with two integers (even if the input was e.g. a DataFrame).

Second, supposing you did t = Tables.columns(t), the Tables.jl interface does not guaranty that getproperty(t.table, p) returns a vector, only an iterator. So I think you have to check whether the returned iterators are AbstractVector, and if not call collect on them. Then you can store the result as a named tuple of vectors (which is the most basic type of table) or use Tables.materializer(t)(cols) to create a table object of the same type as t.

Maybe Tables.jl could provide a function to do this more easily. Feel free to file an issue.

Thanks for the answer. It’s probably best just to use Tables.matrix in this context, even if it copies.

I get why Tables would want to be agnostic about the layout of columns, allowing for iterators or infinite streaming.