Sure, so by “table” I meant a “Tables.jl-compatible table”. That actually does not mean using a specific type (like a DataFrame), but rather having an object that follows the rules to be considered a column-table or a row table. These rules are spelled out in the Tables.jl docs but I think those docs are not always the easiest to follow if you’re not familiar with the system already.
Row tables
The essential idea is that row tables are things that iterate rows, and rows are objects that support getting the specific value of a column in that row (Tables.getcolumn
which falls back to getproperty
or getfield
), and giving the names of the columns (Tables.columnnames
which falls back to propertynames
).
These fallbacks mean that a lot of things can be considered a row table automatically without needing to use the Tables.jl package itself or any custom types. E.g. [(a=1, b=2), (a=3, b=4)]
is a row table with two rows (the two NamedTuples in the vector) and two columns (a
and b
). Likewise a vector of structs is a row table. You can also “opt-in” to the Tables interface by defining the Tables.getcolumn
and Tables.columnnames
methods for your row-like object.
Column tables
Column tables are ordered collections of columns (that you can retrieve by name or index), and a column is an indexable collection with a known length (like a Vector
for example). The Tables.jl methods for column tables are the same as for as for a single row: Tables.getcolumn
to get the column from the table (again falling back to getfield
or getproperty
), which now should return an indexable collection, the column itself, not an element of the table, and Tables.columnnames
(falling back to propertynames
) to get the list of column names. So for example, (; a = [1,2,3], b = ["x", "y", "z"])
is a column table with two columns (a
and b
) and three rows. A DataFrame
is also a column-table.
Returning to the example
So for your example, you could make a DataFrame with the x
and y
columns as you suggested and the write that out, or do something like
using CSV
my_table = @NamedTuple{x::String, y::Vector{Float64}}[]
for i in 1:10
x = "test"; y = rand(1)
push!(my_table, (x = x, y = y))
end
CSV.write("out.csv", my_table)
where I’ve used the @NamedTuple
macro to define the element type of an empty vector to act as the table. I used Vector{Float64}
since that’s what rand(1)
returns; if you want just a single random number, use just rand()
.
Or, depending on what you were doing, you could define a struct
and use it like
using CSV
struct MyRow
x::String
y::Vector{Float64}
end
my_table = MyRow[] # empty vector of `MyRow`s
for i in 1:10
x = "test"; y = rand(1)
push!(my_table, MyRow(x, y))
end
CSV.write("out.csv", my_table)
One of the really nice things about the Tables.jl interface is that you can just use whatever representation makes the most sense for your problem at hand, and any packages supporting the interface will still work. So e.g. you could pass that vector of MyRow
s to DataFrames and you would get a DataFrame
with an x
and y
column, since the DataFrame constructor accepts Tables.jl tables.