JuliaDB - Saving to CSV

sdmcallister · December 4, 2017, 8:34pm

If I have a table with 36 columns and 500,000 plus rows, is there an established method for saving the table as a .CSV for use in other applications?

Thanks

rveltz · December 4, 2017, 8:35pm

the CSV package would do.

sdmcallister · December 4, 2017, 8:43pm

Thought there wasnt datastreams support yet…can you give an example?

rveltz · December 4, 2017, 9:06pm

Is this what you want?

CSV.write("summary-cluster-runs.csv",DataFrame(N=10,w=10.0);append=true)

piever · December 4, 2017, 9:30pm

If the question is about a JuliaDB table, I think the easiest way to save it and load it in a bunch of formats is using FileIO and IterableTables:

julia> using JuliaDB, IterableTables, FileIO

julia> t = table(rand(10), rand(10), names = [:x, :y]);

julia> save("test.csv", t)

julia> table(load("test.csv"))
Table with 10 rows, 2 columns:
x          y
────────────────────
0.0819593  0.259326
0.861373   0.380602
0.633562   0.148728
0.574784   0.0540917
0.978486   0.0148579
0.257681   0.126109
0.242756   0.302516
0.533261   0.0320025
0.900969   0.870229
0.173699   0.332354

I believe it uses TextParse to actually write and read the csv which I believe is very fast.
You may need very recent versions of all of these packages though, I’m on:

 - JuliaDB                       0.4.2              master
 - IterableTables                0.5.2+             master
 - FileIO                        0.6.1

EDIT: I’ve just tested this method and I get:

julia> @time save("test.csv", df)
  3.201264 seconds (28.33 M allocations: 645.473 MiB, 3.88% gc time)

on some example df I had with approx. 500,000 rows and 25 columns, so it should be fast enough also for your user case.

sdmcallister · December 4, 2017, 9:52pm

Thanks. Yes, I am looking to write a JuliaDB table to CSV.

For this example to work, I also had to install CSVFiles v0.4.0.

julia> using JuliaDB,IterableTables,FileIO
WARNING: JuliaDB.DTable is deprecated, use JuliaDB.DNDSparse{K, V} where V where K instead.
  likely near no file:3
WARNING: JuliaDB.DTable is deprecated, use JuliaDB.DNDSparse{K, V} where V where K instead.
  likely near no file:6
WARNING: JuliaDB.DTable is deprecated, use JuliaDB.DNDSparse{K, V} where V where K instead.
  likely near no file:7
WARNING: JuliaDB.DTable is deprecated, use JuliaDB.DNDSparse{K, V} where V where K instead.
  likely near no file:10

julia> t = table(rand(10),rand(10),names= [:x,:y]);

julia> save("test.csv",t)
WARNING: both FileIO and JuliaDB export "save"; uses of it in module Main must be qualified
ERROR: UndefVarError: save not defined

julia> FileIO.save("test.csv",t)
Error encountered while saving "test.csv".
Fatal error:
Library "CSVFiles" is not installed but is recommended as a library to load format: ".csv"
Should we install "CSVFiles" for you? (y/n):


INFO: Start installing CSVFiles...
INFO: Cloning cache of CSVFiles from https://github.com/davidanthoff/CSVFiles.jl.git
INFO: Installing CSVFiles v0.4.0
INFO: Building MbedTLS
Using system libraries...
INFO: Package database updated


julia> FileIO.save("test.csv",t)
INFO: Recompiling stale cache file C:\Users\sdmca\.julia\lib\v0.6\HTTP.ji for module HTTP.

piever · December 4, 2017, 10:29pm

You’re right, I forgot to mention that FileIO is some sort of “metapackage” which will require specific packages to handle specific formats (CSVFiles for CSV, FeatherFiles for Feather, ExcelFiles for Excel and so on). The advantage is that the interface is simple and always the same and you choose the format with the extension of the filename.

sdmcallister · December 4, 2017, 11:39pm

This also works:

open("test.csv","w") do fid
    println(fid,join(colnames(t),','))
    for i in collect(t)
        println(fid,join(i,','))
    end
end

julia> t = table(rand(100000),rand(100000),names= [:x,:y],chunks=1);

julia> @time open("text.csv","w") do fid
       println(fid,join(colnames(t),','))
       for i in collect(t)
       println(fid,join(i,','))
       end
       end
  0.198821 seconds (1.60 M allocations: 58.118 MiB, 10.27% gc time)

versipellis · June 18, 2019, 8:03pm

This is a bit of a threadrez - but is there more updated way of doing this?

using FileIO, IterableTables, CSVFiles
FileIO.save("test1.csv", grouped)
>Error encountered while saving "test1.csv".
>Fatal error:

MaximilianJHuber · June 18, 2019, 8:40pm

I think CSVFiles is not needed. But this should still work, why are you asking?

versipellis · June 18, 2019, 8:53pm

Trying that results in Can't write this data to a CSV file.

I found a solution though:

collected = collect(dindexedtable) #needed if it's distributed
df = DataFrame(values(collected))
names!(df, [:col1name, :col2name])
CSV.write("data/savedfile.csv", df)

MaximilianJHuber · June 18, 2019, 9:30pm

Please, open an issue on GitHub and describe the data set that cannot be saved to file. Ideally, provide a self-contained minimal working example of the error!

Topic		Replies	Views
Streaming Tables to CSV New to Julia csv , tables	6	826	April 25, 2023
Ingesting data to JuliaDB without .csv files Data question	4	1286	August 30, 2018
Why do you use JuliaDB? General Usage	9	2140	October 28, 2019
Append rows to a JuliaDB table on disk General Usage	1	380	August 4, 2020
JuliaDB won't open CSV file General Usage juliadb	6	1923	April 10, 2019

JuliaDB - Saving to CSV

Related topics