Struggling to implement Tables.jl interface (again)

Sorry for the post, it is very similar to previous ones but I just cannot get it to work.

I am trying to implement a Tables.jl interface for a Vector{JSON3.Object}. Here is my effort so far:


struct JSONLines 
    objects::Vector{JSON3.Object}
    names::Vector{Symbol}
end

struct Row <: Tables.AbstractRow
    row::Int
    source::JSONLines
end

Tables.rowaccess(t::JSONLines) = true
function Tables.rows(t::JSONLines)
    return [Row(i, t) for i in 1:length(t)]
end

Base.eltype(t::JSONLines) = JSON3.Object
Base.length(t::JSONLines) = length(getfield(t, :objects))
Base.iterate(t::JSONLines, st = 1) = st > length(t) ? nothing : (Row(st, t), st + 1)
Base.size(t::JSONLines, dim = 1) = dim == 1 ? length(t) : length(getfield(t, :names))

Tables.istable(::Type{<:JSONLines}) = true


function Tables.getcolumn(row::Row, i::Int) 
    src = getfield(row, :source)
    return getfield(src, :objects)[getfield(row, :row)][columnnames(row)[i]]
end
function Tables.getcolumn(row::Row, nm::Symbol) 
    src = getfield(row, :source)
    obj = getfield(src, :objects)
    srcrow = obj[getfield(row, :row)]
    return srcrow[nm]
end
Tables.columnnames(row::Row) = getfield(getfield(row, :source), :names)

A typical input would look like this

input = JSON3.read.(["""{"A":1, "B":"hi"}""", """{"A":2, "B":"hello"}"""])
nms = collect(keys(input[1]))
jsnl = JSONLines(input, nms)

This is where I get stuck:

using DataFrames
DataFrame(jsnl)
julia> DataFrame(jsnl)
2Γ—2 DataFrame
β”‚ Row β”‚ A     β”‚ B      β”‚
β”‚     β”‚ Int64 β”‚ String β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 1   β”‚ 1     β”‚ hi     β”‚
β”‚ 2   β”‚ 2     β”‚ hello  β”‚
using StructArrays
StructArray(jsonl)
julia> StructArray(jsnl)
ERROR: KeyError: key :row not found

Any idea what I am doing wrong?
Thanks!

EDIT:
I checked all the functions in the Tables.jl β€œuser interface” and they seem to work:

julia> jsnl = JSONLines(input, nms)
JSONLines(JSON3.Object[{
   "A": 1,
   "B": "hi"
}, {
   "A": 2,
   "B": "hello"
}], [:A, :B])

julia> rws = Tables.rows(jsnl)
2-element Array{Row,1}:
 Row: (A = 1, B = "hi")
 Row: (A = 2, B = "hello")

julia> Tables.columnnames(rws[1])
2-element Array{Symbol,1}:
 :A
 :B

julia> Tables.getcolumn(rws[1], 1)
1

julia> Tables.getcolumn(rws[2], :B)
"hello"

julia> cls = Tables.columns(jsnl)
Tables.CopiedColumns{NamedTuple{(:A, :B),Tuple{Array{Int64,1},Array{String,1}}}}: (A = [1, 2], B = ["hi", "hello"])

julia> Tables.getcolumn(cls, 1)
2-element Array{Int64,1}:
 1
 2

julia> Tables.getcolumn(cls, :B)
2-element Array{String,1}:
 "hi"
 "hello"

EDIT 2:

Also tested with JuliaDB and seems to be working

julia> using JuliaDB

julia> table(jsnl)
Table with 2 rows, 2 columns:
A  B
──────────
1  "hi"
2  "hello"
1 Like

At the moment StructArrays is a bit problematic with things that overload getproperty, because it tries to reconstruct the whole object (not just the namedtuple), so it gets a bit confused if the properties do not correspond to the fields (though this should be fixed in the next release).

The easiest is to do StructArray(Tables.columntable(jsnl)), where Tables.columntable gives you a named tuple of columns corresponding to the table.

OTOH, I suspect that a Vector{JSON3.Object} may already respect the Tables interface. For example, what happens if you do Tables.columntable(v::Vector{JSON3.Object})?

2 Likes

Thank you that seems to work!

julia> input = JSON3.read.(["""{"A":1, "B":"hi"}""", """{"A":2, "B":"hello"}"""])
2-element Array{JSON3.Object{Base.CodeUnits{UInt8,String},Array{UInt64,1}},1}:
 {
   "A": 1,
   "B": "hi"
}
 {
   "A": 2,
   "B": "hello"
}

julia> Tables.columntable(input)
(A = [1, 2], B = ["hi", "hello"])