I’m trying to make Vector{MyStruct} a Table. One of the things that I have in mind is that I’ll be able to save Vector{MyStruct} into a parquet file by just calling write_parquet(file, tbl)
, which you can see should be possible from looking at the last section of README.md in [1]: “You can write any Tables.jl column-accessible table that contains columns of these types”. My problem is that I haven’t been able to figure out of how make Vector{MyStruct} a Tables.jl column-compatible table, even though I’ve looked at numerous examples.
Now, according to the README.md of Tables.jl, [2] the only thing I should need is to define three functions:
Tables.istable(table) - Declare that your table type implements the interface
Tables.columnaccess(table) - Declare that your table type defines a Tables.columns(table) method
Tables.columns(table) - Return an Tables.AbstractColumns-compatible object from your table
(Optionally there’s a Tables.schema function which I’m also happy to implement)
The first two are straightforward enough:
using DataFrames, Tables, Parquet
struct MyStruct
a::Float64
b::Float64
end
Tables.istable(::Type{<:Vector{MyStruct}}) = true
Tables.columnaccess(::Type{<:Vector{MyStruct}}) = true
Now regarding Tables.column
, I’m not sure what the phrase “Return an Tables.AbstractColumns-compatible object from your table” means. So I looked to see what DataFrames does:
df = DataFrame(Dict(:a=>[1.0,2.0], :b=>[2.0,3.0]))
Tables.columns(df)
This returns the dataframe itself. I guess it makes sense to think that a DataFrame is already a columnar table in the sense that it I can call getproperty(df ,:a)
and get a vector.
So, do I need to implement Tables.columns to return something that has getproperty defined on it? If so, I expected that this would work:
Tables.columns(x::Vector{MyStruct}) = Dict(:a=>[getproperty(x[i], :a) for i in 1:length(x)], :b=>[getproperty(x[i], :b) for i in 1:length(x)]
But in fact it doesn’t:
write_parquet("/home/myuser/lala.parquet", x)
ERROR: type Nothing has no field types
Stacktrace:
[1] getproperty(::Nothing, ::Symbol) at ./Base.jl:33
[2] write_parquet(::String, ::Array{MyStruct,1}; compression_codec::String) at /home/myuser/.julia/packages/Parquet/g6mqp/src/writer.jl:470
[3] write_parquet(::String, ::Array{MyStruct,1}) at /home/myuser/.julia/packages/Parquet/g6mqp/src/writer.jl:465
[4] top-level scope at REPL[20]:1
So now I suspect that the key bit is that the thing returned must be “AbstractColumns”-compatible. So an alternative is to define a new type, MyStructTable which contains the same data as Vector{MyStruct} but as columns, and then define Tables.getcolumn
and Tables.columnnames
. So following [2] I’ve tried the following:
struct MyStructTable <: Tables.AbstractColumns
names::Vector{Symbol}
lookup::Dict{Symbol, Int}
data::Vector{Vector{Float64}}
end
Now I need a constructor to build MyStructTable from Vector{MyStruct}. However, the natural thing throws an error:
MyStructTable(x::Vector{MyStruct}) = MyStructTable([:a,:b], Dict(:a=>1,:b=>2), [[getproperty(x[i], :a) for i in 1:length(x)],[getproperty(x[i], :b) for i in 1:length(x)]])
So I can’t even get the constructor working
julia> MyStructTable(x)
MyStructTable: Error showing value of type MyStructTable:
ERROR: StackOverflowError:
Stacktrace:
[1] columnnames(::MyStructTable) at /home/myuser/.julia/packages/Tables/okt7x/src/Tables.jl:105
[2] propertynames(::MyStructTable) at /home/myuser/.julia/packages/Tables/okt7x/src/Tables.jl:165
... (the last 2 lines are repeated 39990 more times)
[79983] columnnames(::MyStructTable) at /home/myuser/.julia/packages/Tables/okt7x/src/Tables.jl:105
Help?
[1] GitHub - JuliaIO/Parquet.jl: Julia implementation of Parquet columnar file format reader
[2] https://github.com/JuliaData/Tables.jl/blob/master/docs/src/index.md
The constructor doesn’t work, but the idea was to then implement AbstractColumns by defining something like:
Tables.getcolumn(m::MyStructTable, ::Type{T}, col::Int, nm::Symbol) where {T} = m.data[col]
Tables.getcolumn(m::MyStructTable, nm::Symbol) = m.data[m.lookup[nm]]
Tables.getcolumn(m::MyStructTable, i::Int) = m.data[i]
Tables.columnnames(m::MyStructTable) = m.names