Writing to DataFrames and CSV files

Hi, I am looking to write output of a function for different runs to one CSV file. Using the DelimitedFiles package, I can write results to separate CSV files for each run. However, I want to write results from all runs to one CSV file either in separate rows or separate columns.

As an example, I have given some code below. This will create separate CSV file for each run.

To write all results to one CSV file I use CSV.write("TestCSVPack.csv", DataFrame(result)) but I get the following error message
ArgumentError: β€˜Base.Generator{UnitRange{Int64},var"#13#14"{Array{Int64,1}}}’ iterates β€˜Int64’ values, which doesn’t satisfy the Tables.jl AbstractRow interface

using CSV, DataFrames, DelimitedFiles
function Addition(x,y,i)
    return x[:,i]+y[:,i]
end
x = [1 2 3; 6 8 9]
y = [1 3 1; 10 12 5]
for i = 1:size(x,2)
    result = Addition(x,y,i)
# use DelimitedFiles to create a separate CSV file for each run
    writedlm( "TestDelimited$i.csv", result)
# use CSV package to create a CSV file for all results. The following line will give an error
   CSV.write("TestCSVPack.csv", DataFrame(result))
end

Is there a simple way to fix it, either using the CSV package or another package capable of writing CSV files?

Maybe not exactly what you want, but this works

open("myfile.txt", "w") do io
    for i = 1:size(x,2)
        result = Addition(x,y,i)
        write(io, string(result))
    end
end

Played some with CSV and DataFrames, but couldn’t make it work thus far.

one thing you can do is vcat all the results together before writing

1 Like

I would concatenate the results into a DataFrame, also keeping track of the i indexes for later use. Something like

julia> mapreduce(i -> (result = addition(x, y, i);
       DataFrame(i = fill(i, length(result)), result = result)),
       vcat, 1:size(x, 2))
6Γ—2 DataFrame
β”‚ Row β”‚ i     β”‚ result β”‚
β”‚     β”‚ Int64 β”‚ Int64  β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 1   β”‚ 1     β”‚ 2      β”‚
β”‚ 2   β”‚ 1     β”‚ 16     β”‚
β”‚ 3   β”‚ 2     β”‚ 5      β”‚
β”‚ 4   β”‚ 2     β”‚ 20     β”‚
β”‚ 5   β”‚ 3     β”‚ 4      β”‚
β”‚ 6   β”‚ 3     β”‚ 14     β”‚

which you can then save using CSV.write.

There are two different problems in your approach.
First one is the error message you receive, and you see it because there is no DataFrame constructor of the column, since typeof(result) is Vector{Int64} . You can avoid it by either giving explicit name to this column, or by turning it into Matrix transposing it. For example

result = Addition(x, y, 1)

# Option 1

julia> DataFrame(:x => result)
2Γ—1 DataFrame
β”‚ Row β”‚ x     β”‚
β”‚     β”‚ Int64 β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 1   β”‚ 2     β”‚
β”‚ 2   β”‚ 16    β”‚

# Option 2
julia> DataFrame(result')
1Γ—2 DataFrame
β”‚ Row β”‚ x1    β”‚ x2    β”‚
β”‚     β”‚ Int64 β”‚ Int64 β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 1   β”‚ 2     β”‚ 16    β”‚

Second problem is writing to the same file many times, it can be solved by using append flag.

using CSV
using DataFrames

function Addition(x,y,i)
    return x[:,i]+y[:,i]
end
x = [1 2 3; 6 8 9]
y = [1 3 1; 10 12 5]
for i = 1:size(x,2)
    result = Addition(x,y,i)
    CSV.write("TestCSVPack.csv", DataFrame(:x => result), append = i != 1, writeheader = false)
end

with content of TestCSVPack.csv

sh> cat TestCSVPack.csv
2
16
5
20
4
14

As additional note, if your data is homogeneous as in your example, you may omit DataFrames since they introduce some overhead and use other Tables.jl sinks, for example (here I also write row by row)

using CSV
using Tables

for i = 1:size(x,2)
    result = Addition(x,y,i)
   CSV.write("TestCSVPack.csv", Tables.table(result'), append = i != 1, writeheader = false)
end

with contents

sh> cat TestCSVPack.csv
2,16
5,20
4,14
2 Likes

Showing my ignorance here… would it be worth considering other file formats?
for instance HDF5 https://github.com/JuliaIO/HDF5.jl

Thanks. This works fine for the example I gave. I am not sure if this approach requires storing results from all runs in to an array before writing to a dataframe. So it may not work significantly large sized problems.

Thanks. Your suggestion is perfect for my needs as I think it does not require holding results from all the runs in the memory at the same time. Can I check how can I record i (run number) in to the CSV as in the response given by Tamas?

For e.g. using the code below records i but I cannot work out how it can be used based on your suggestion

df=mapreduce(i -> (result = Addition(x, y, i); DataFrame(I= fill(i, length(result)), Result = result)),vcat, 1:size(x, 2))
CSV.write("TestCSVPack.csv", df)

This produces the following DataFrame

6Γ—2 DataFrame
β”‚ Row β”‚ I     β”‚ Result β”‚
β”‚     β”‚ Int64 β”‚ Int64  β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 1   β”‚ 1     β”‚ 2      β”‚
β”‚ 2   β”‚ 1     β”‚ 16     β”‚
β”‚ 3   β”‚ 2     β”‚ 5      β”‚
β”‚ 4   β”‚ 2     β”‚ 20     β”‚
β”‚ 5   β”‚ 3     β”‚ 4      β”‚
β”‚ 6   β”‚ 3     β”‚ 14     β”‚

Certainly it is worth considering other file formats. I am not familiar with HDF5 format but it looks interesting. I cannot find much documentation with this package so it may take me sometime to work out how to write results for the example given in the question.

You should proceed in the same way as was proposed by Tamas, just add new column in resulting DataFrame

for i = 1:size(x,2)
    result = Addition(x,y,i)
    CSV.write("TestCSVPack.csv", DataFrame(:n => i, :x => result), append = i != 1, writeheader = false)
end

N.B.: as you can see, I’ve used just i, not fill(i, length(result)) because DataFrame constructor auto cycle this column, but both versions are valid. Also, you can remove writeheader argument if you want to, because CSV does not create additional header rows in append mode.

1 Like

Thanks.
Lastly, how can I add headings to the CSV /DataFrame i.e. to have β€œi” and β€œResult” as headings for the data (to appear once at the top)?

Using this code would write headings for each run, but I want it to appear once
CSV.write("TestCSVPack.csv", DataFrame(I = i, Result = result), append = i != 1, writeheader = true)

|I|Result|
|1|2|
|1|16|
|I|Result|
|2|5|
|2|20|
|I|Result|
|3|4|
|3|14|

Just remove writeheader completely.

Thank you

Thanks for looking in to jt

Sorry for re-opening this old thread, but it seems that HDF5.jl does not directly work together with DataFrames. The package HDFTables.jl, which seems to be such an interface, is not maintained anymore, as far as I can see.
Is there a simple way to open HDF files (saved in Pandas) in Julia DataFrames?