Writing to DataFrames and CSV files

Smith · March 20, 2020, 12:55am

Hi, I am looking to write output of a function for different runs to one CSV file. Using the DelimitedFiles package, I can write results to separate CSV files for each run. However, I want to write results from all runs to one CSV file either in separate rows or separate columns.

As an example, I have given some code below. This will create separate CSV file for each run.

To write all results to one CSV file I use CSV.write("TestCSVPack.csv", DataFrame(result)) but I get the following error message
ArgumentError: ‘Base.Generator{UnitRange{Int64},var"#13#14"{Array{Int64,1}}}’ iterates ‘Int64’ values, which doesn’t satisfy the Tables.jl AbstractRow interface

using CSV, DataFrames, DelimitedFiles
function Addition(x,y,i)
    return x[:,i]+y[:,i]
end
x = [1 2 3; 6 8 9]
y = [1 3 1; 10 12 5]
for i = 1:size(x,2)
    result = Addition(x,y,i)
# use DelimitedFiles to create a separate CSV file for each run
    writedlm( "TestDelimited$i.csv", result)
# use CSV package to create a CSV file for all results. The following line will give an error
   CSV.write("TestCSVPack.csv", DataFrame(result))
end

Is there a simple way to fix it, either using the CSV package or another package capable of writing CSV files?

alejandromerchan · March 20, 2020, 2:07am

Maybe not exactly what you want, but this works

open("myfile.txt", "w") do io
    for i = 1:size(x,2)
        result = Addition(x,y,i)
        write(io, string(result))
    end
end

Played some with CSV and DataFrames, but couldn’t make it work thus far.

jling · March 20, 2020, 2:27am

one thing you can do is vcat all the results together before writing

Tamas_Papp · March 20, 2020, 6:21am

I would concatenate the results into a DataFrame, also keeping track of the i indexes for later use. Something like

julia> mapreduce(i -> (result = addition(x, y, i);
       DataFrame(i = fill(i, length(result)), result = result)),
       vcat, 1:size(x, 2))
6×2 DataFrame
│ Row │ i     │ result │
│     │ Int64 │ Int64  │
├─────┼───────┼────────┤
│ 1   │ 1     │ 2      │
│ 2   │ 1     │ 16     │
│ 3   │ 2     │ 5      │
│ 4   │ 2     │ 20     │
│ 5   │ 3     │ 4      │
│ 6   │ 3     │ 14     │

which you can then save using CSV.write.

Skoffer · March 20, 2020, 8:05am

There are two different problems in your approach.
First one is the error message you receive, and you see it because there is no DataFrame constructor of the column, since typeof(result) is Vector{Int64} . You can avoid it by either giving explicit name to this column, or by turning it into Matrix transposing it. For example

result = Addition(x, y, 1)

# Option 1

julia> DataFrame(:x => result)
2×1 DataFrame
│ Row │ x     │
│     │ Int64 │
├─────┼───────┤
│ 1   │ 2     │
│ 2   │ 16    │

# Option 2
julia> DataFrame(result')
1×2 DataFrame
│ Row │ x1    │ x2    │
│     │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1   │ 2     │ 16    │

Second problem is writing to the same file many times, it can be solved by using append flag.

using CSV
using DataFrames

function Addition(x,y,i)
    return x[:,i]+y[:,i]
end
x = [1 2 3; 6 8 9]
y = [1 3 1; 10 12 5]
for i = 1:size(x,2)
    result = Addition(x,y,i)
    CSV.write("TestCSVPack.csv", DataFrame(:x => result), append = i != 1, writeheader = false)
end

with content of TestCSVPack.csv

sh> cat TestCSVPack.csv
2
16
5
20
4
14

As additional note, if your data is homogeneous as in your example, you may omit DataFrames since they introduce some overhead and use other Tables.jl sinks, for example (here I also write row by row)

using CSV
using Tables

for i = 1:size(x,2)
    result = Addition(x,y,i)
   CSV.write("TestCSVPack.csv", Tables.table(result'), append = i != 1, writeheader = false)
end

with contents

sh> cat TestCSVPack.csv
2,16
5,20
4,14

johnh · March 20, 2020, 8:19am

Showing my ignorance here… would it be worth considering other file formats?
for instance HDF5 GitHub - JuliaIO/HDF5.jl: Save and load data in the HDF5 file format from Julia

Smith · March 20, 2020, 12:35pm

Thanks. This works fine for the example I gave. I am not sure if this approach requires storing results from all runs in to an array before writing to a dataframe. So it may not work significantly large sized problems.

Smith · March 20, 2020, 12:51pm

Thanks. Your suggestion is perfect for my needs as I think it does not require holding results from all the runs in the memory at the same time. Can I check how can I record i (run number) in to the CSV as in the response given by Tamas?

For e.g. using the code below records i but I cannot work out how it can be used based on your suggestion

df=mapreduce(i -> (result = Addition(x, y, i); DataFrame(I= fill(i, length(result)), Result = result)),vcat, 1:size(x, 2))
CSV.write("TestCSVPack.csv", df)

This produces the following DataFrame

6×2 DataFrame
│ Row │ I     │ Result │
│     │ Int64 │ Int64  │
├─────┼───────┼────────┤
│ 1   │ 1     │ 2      │
│ 2   │ 1     │ 16     │
│ 3   │ 2     │ 5      │
│ 4   │ 2     │ 20     │
│ 5   │ 3     │ 4      │
│ 6   │ 3     │ 14     │

Smith · March 20, 2020, 1:03pm

Certainly it is worth considering other file formats. I am not familiar with HDF5 format but it looks interesting. I cannot find much documentation with this package so it may take me sometime to work out how to write results for the example given in the question.

Skoffer · March 20, 2020, 2:26pm

You should proceed in the same way as was proposed by Tamas, just add new column in resulting DataFrame

for i = 1:size(x,2)
    result = Addition(x,y,i)
    CSV.write("TestCSVPack.csv", DataFrame(:n => i, :x => result), append = i != 1, writeheader = false)
end

N.B.: as you can see, I’ve used just i, not fill(i, length(result)) because DataFrame constructor auto cycle this column, but both versions are valid. Also, you can remove writeheader argument if you want to, because CSV does not create additional header rows in append mode.

Smith · March 20, 2020, 3:40pm

Thanks.
Lastly, how can I add headings to the CSV /DataFrame i.e. to have “i” and “Result” as headings for the data (to appear once at the top)?

Using this code would write headings for each run, but I want it to appear once
CSV.write("TestCSVPack.csv", DataFrame(I = i, Result = result), append = i != 1, writeheader = true)

|I|Result|
|1|2|
|1|16|
|I|Result|
|2|5|
|2|20|
|I|Result|
|3|4|
|3|14|

Skoffer · March 20, 2020, 4:40pm

Just remove writeheader completely.

Smith · March 20, 2020, 5:21pm

Thank you

Smith · March 20, 2020, 9:16pm

Thanks for looking in to jt

lungben · August 24, 2020, 12:38pm

Sorry for re-opening this old thread, but it seems that HDF5.jl does not directly work together with DataFrames. The package HDFTables.jl, which seems to be such an interface, is not maintained anymore, as far as I can see.
Is there a simple way to open HDF files (saved in Pandas) in Julia DataFrames?

Topic		Replies	Views
Saving outputs in CSV file General Usage csv	20	6947	August 10, 2020
Dataframe functional input and output formats in julia 1.0, august 2018 Data	2	1344	August 21, 2018
How do I export strings to CSV file？ General Usage question	8	1275	September 19, 2020
Combining dataframe and csv: undefined functions in DataFrames Data question , dataframes	5	729	May 18, 2021
Streaming Tables to CSV New to Julia csv , tables	6	826	April 25, 2023

Writing to DataFrames and CSV files

Related topics