How to save .string, .float data in .csv at the same time?

Sawyer_Zhou · July 4, 2021, 10:40am

I want to save the float and string data in CSV at the same time, here is a mini-case of my code:

using CSV
for i in 1:10
    x = "test"; y = rand(1)
    CSV.write("out.csv", (x = x, y = y),  append = true)    
end

I want it to output a table with strings “test” in the first column and random numbers in the second column.
However, I get the error message:
ArgumentError: 'NamedTuple{(:x, :y), Tuple{String, Vector{Float64}}}' iterates 'String' values, which doesn't satisfy the Tables.jl AbstractRow interface

Could someone give me some ideas on how to solve this problem?
Thanks!

ericphanson · July 4, 2021, 11:17am

That error means CSV.write expects a table, but you’re giving it a single row of a table instead. So the most minimal fix is to call

CSV.write("out.csv", [(x = x, y = y)],  append = true)

instead (since a collection of rows makes a table). However it might be more performant to build the table and then write it all at once.

Sawyer_Zhou · July 4, 2021, 11:49am

Thank you so muck for your reply, it does work.
About the building table method you referred, is it about DataFrame or Tables? Could you please recommend relevant document links? Thanks!

ericphanson · July 4, 2021, 12:24pm

Sure, so by “table” I meant a “Tables.jl-compatible table”. That actually does not mean using a specific type (like a DataFrame), but rather having an object that follows the rules to be considered a column-table or a row table. These rules are spelled out in the Tables.jl docs but I think those docs are not always the easiest to follow if you’re not familiar with the system already.

Row tables

The essential idea is that row tables are things that iterate rows, and rows are objects that support getting the specific value of a column in that row (Tables.getcolumn which falls back to getproperty or getfield), and giving the names of the columns (Tables.columnnames which falls back to propertynames).

These fallbacks mean that a lot of things can be considered a row table automatically without needing to use the Tables.jl package itself or any custom types. E.g. [(a=1, b=2), (a=3, b=4)] is a row table with two rows (the two NamedTuples in the vector) and two columns (a and b). Likewise a vector of structs is a row table. You can also “opt-in” to the Tables interface by defining the Tables.getcolumn and Tables.columnnames methods for your row-like object.

Column tables

Column tables are ordered collections of columns (that you can retrieve by name or index), and a column is an indexable collection with a known length (like a Vector for example). The Tables.jl methods for column tables are the same as for as for a single row: Tables.getcolumn to get the column from the table (again falling back to getfield or getproperty), which now should return an indexable collection, the column itself, not an element of the table, and Tables.columnnames (falling back to propertynames) to get the list of column names. So for example, (; a = [1,2,3], b = ["x", "y", "z"]) is a column table with two columns (a and b) and three rows. A DataFrame is also a column-table.

Returning to the example

So for your example, you could make a DataFrame with the x and y columns as you suggested and the write that out, or do something like

using CSV
my_table = @NamedTuple{x::String, y::Vector{Float64}}[]
for i in 1:10
    x = "test"; y = rand(1)
    push!(my_table, (x = x, y = y))
end

CSV.write("out.csv",  my_table)

where I’ve used the @NamedTuple macro to define the element type of an empty vector to act as the table. I used Vector{Float64} since that’s what rand(1) returns; if you want just a single random number, use just rand().

Or, depending on what you were doing, you could define a struct and use it like

using CSV
struct MyRow
    x::String
    y::Vector{Float64}
end

my_table = MyRow[] # empty vector of `MyRow`s
for i in 1:10
    x = "test"; y = rand(1)
    push!(my_table, MyRow(x, y))
end

CSV.write("out.csv",  my_table)

One of the really nice things about the Tables.jl interface is that you can just use whatever representation makes the most sense for your problem at hand, and any packages supporting the interface will still work. So e.g. you could pass that vector of MyRows to DataFrames and you would get a DataFrame with an x and y column, since the DataFrame constructor accepts Tables.jl tables.

Sawyer_Zhou · July 4, 2021, 3:34pm

Wow, thank you very much for your detailed reply, It is really really helpful.
Thank you!

Topic		Replies	Views
How do I export strings to CSV file？ General Usage question	8	1275	September 19, 2020
Is this an efficient way to write some information into a .csv file using julia? General Usage csv , namedtuple , structarrays	6	3192	January 27, 2019
Saving outputs in CSV file General Usage csv	20	6928	August 10, 2020
How can I save this data with CSV? Data csv , namedtuple	5	1044	September 7, 2020
CSV.jl, Write to files General Usage csv	3	1373	October 24, 2019

How to save .string, .float data in .csv at the same time?

Row tables

Column tables

Returning to the example

Related topics