Help implementing Tables.jl interface

PR with code here: https://github.com/omnisci/OmniSci.jl/pull/88/files

I’m looking to modify some functions that take DataFrames as arguments to be more generic to take Tables, but I’m still not sure I understand why Tables works. Specifically, take the following code:

function load_table(conn::OmniSciConnection, table_name::String, df::DataFrame)

    df_to_array = [OmniSci.TStringRow(x) for x in DataFrames.eachrow(df)]
    load_table(conn, table_name, df_to_array)

end

I re-wrote this code for Tables like the following:

function load_table(conn::OmniSciConnection, table_name::String, tbl_obj)

    tbl_to_array = [OmniSci.TStringRow(x) for x in Tables.rowtable(tbl_obj)]
    load_table(conn, table_name, tbl_to_array)

end

This example works, but where I’m getting stuck is with the lack of dispatch, ensuring that someone doesn’t pass a generic Table that won’t work.

If I understand the Tables.jl interface theory, people should be able to pass any Table and not worry about it; I as the library maintainer am supposed to deal with this appropriately. In my case, the load_table method has to be row-wise for the database to understand it; does the way I’ve written the code ensure this? Or does this just happen to work because a DataFrame can be iterated over row-wise?

Yes, calling Tables.rowtable ensures you get a vector of named tuples (as documented in README.md). But that’s really inefficient for your situation since it makes a copy. I think you want to use Tables.rows, which will return the most appropriate type iterator over rows.

Thanks…I did checkout the README, and even Tables.rows, but got this error:


function load_table(conn::OmniSciConnection, table_name::String, tbl_obj)

    tbl_to_array = [OmniSci.TStringRow(x) for x in Tables.rows(tbl_obj)]
    load_table(conn, table_name, tbl_to_array)

end

 MethodError: no method matching iterate(::Tables.ColumnsRow{NamedTuple{(:x1, :x2, :x3, :x4, :x5, :x6, :x7, :x8, :x9, :x10, :x11),Tuple{Array{Union{Missing, Int8},1},Array{Union{Missing, Int16},1},Array{Union{Missing, Int32},1},Array{Union{Missing, Int64},1},Array{Union{Missing, Float32},1},Array{Union{Missing, Float64},1},Array{Union{Missing, Bool},1},Array{String,1},Array{Union{Missing, Date},1},Array{Union{Missing, Time},1},Array{Union{Missing, DateTime},1}}}})
  Closest candidates are:
    iterate(!Matched::Core.SimpleVector) at essentials.jl:604
    iterate(!Matched::Core.SimpleVector, !Matched::Any) at essentials.jl:604
    iterate(!Matched::ExponentialBackOff) at error.jl:214
    ...
  Stacktrace:
   [1] iterate at ./generator.jl:44 [inlined]
   [2] collect(::Base.Generator{Tables.ColumnsRow{NamedTuple{(:x1, :x2, :x3, :x4, :x5, :x6, :x7, :x8, :x9, :x10, :x11),Tuple{Array{Union{Missing, Int8},1},Array{Union{Missing, Int16},1},Array{Union{Missing, Int32},1},Array{Union{Missing, Int64},1},Array{Union{Missing, Float32},1},Array{Union{Missing, Float64},1},Array{Union{Missing, Bool},1},Array{String,1},Array{Union{Missing, Date},1},Array{Union{Missing, Time},1},Array{Union{Missing, DateTime},1}}}},Type{OmniSci.TStringValue}}) at ./array.jl:606
   [3] OmniSci.TStringRow(::Tables.ColumnsRow{NamedTuple{(:x1, :x2, :x3, :x4, :x5, :x6, :x7, :x8, :x9, :x10, :x11),Tuple{Array{Union{Missing, Int8},1},Array{Union{Missing, Int16},1},Array{Union{Missing, Int32},1},Array{Union{Missing, Int64},1},Array{Union{Missing, Float32},1},Array{Union{Missing, Float64},1},Array{Union{Missing, Bool},1},Array{String,1},Array{Union{Missing, Date},1},Array{Union{Missing, Time},1},Array{Union{Missing, DateTime},1}}}}) at /home/rzwitch/.julia/dev/OmniSci/src/constructors.jl:240
   [4] iterate at ./generator.jl:47 [inlined]
   [5] collect(::Base.Generator{Tables.RowIterator{NamedTuple{(:x1, :x2, :x3, :x4, :x5, :x6, :x7, :x8, :x9, :x10, :x11),Tuple{Array{Union{Missing, Int8},1},Array{Union{Missing, Int16},1},Array{Union{Missing, Int32},1},Array{Union{Missing, Int64},1},Array{Union{Missing, Float32},1},Array{Union{Missing, Float64},1},Array{Union{Missing, Bool},1},Array{String,1},Array{Union{Missing, Date},1},Array{Union{Missing, Time},1},Array{Union{Missing, DateTime},1}}}},Type{OmniSci.TStringRow}}) at ./array.jl:606
   [6] load_table(::OmniSci.OmniSciConnection, ::String, ::DataFrame) at /home/rzwitch/.julia/dev/OmniSci/src/client.jl:404
   [7] top-level scope at /home/rzwitch/.julia/dev/OmniSci/test/runtests.jl:155
   [8] top-level scope at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.2/Test/src/Test.jl:1113
   [9] top-level scope at /home/rzwitch/.julia/dev/OmniSci/test/runtests.jl:124

Based on your explanation, Tables.rowtable makes the copy the works with my existing codebase; this error is saying that I don’t have an iterator method defined?

A reading with fresh eyes answers my other question…it appears I should evaluating istable = true) since dispatch isn’t available :

> For consumers, this function should always be called on instances (like Tables.istable(x) ), to ensure input tables are appropriately supported

No, apparently most shouldn’t validate

Note that the function didn’t do any validation on the input to check if it was a valid table: Tables.rows(x) will throw an error if x doesn’t actually satisfy the Tables.jl interface. Alternatively, we could call Tables.istable(x) (as shown in the commented line at the start of the function) on the input before calling Tables.rows(x) if we needed to restrict things to known, valid Tables.jl. Note that doing this will prevent certain, valid table inputs from being consumed, due to their inability to confidently return true for Tables.istable , even at runtime (cases like Generator s, or Vector{Any} ). In short, most package just call Tables.rows , allowing invalid source errors to be thrown while also accepting the maximum number of possible valid inputs.

Actually the error is due to the fact that OmniSci.TStringRow tries to call collect on the row returned by the iterator. But Tables.jl doesn’t guarantee that you can iterate row objects, only that you can call propertynames and getproperty on these names. You may want to read and/or comment at https://github.com/JuliaData/Tables.jl/issues/75

Thanks for getting me over the hurdle, at least I’m not (ncessarily) doing this wrong, just that its unsupported.