JuliaDB - Saving to CSV

If I have a table with 36 columns and 500,000 plus rows, is there an established method for saving the table as a .CSV for use in other applications?

Thanks

the CSV package would do.

Thought there wasnt datastreams support yet…can you give an example?

Is this what you want?

CSV.write("summary-cluster-runs.csv",DataFrame(N=10,w=10.0);append=true)

If the question is about a JuliaDB table, I think the easiest way to save it and load it in a bunch of formats is using FileIO and IterableTables:

julia> using JuliaDB, IterableTables, FileIO

julia> t = table(rand(10), rand(10), names = [:x, :y]);

julia> save("test.csv", t)

julia> table(load("test.csv"))
Table with 10 rows, 2 columns:
x          y
────────────────────
0.0819593  0.259326
0.861373   0.380602
0.633562   0.148728
0.574784   0.0540917
0.978486   0.0148579
0.257681   0.126109
0.242756   0.302516
0.533261   0.0320025
0.900969   0.870229
0.173699   0.332354

I believe it uses TextParse to actually write and read the csv which I believe is very fast.
You may need very recent versions of all of these packages though, I’m on:

 - JuliaDB                       0.4.2              master
 - IterableTables                0.5.2+             master
 - FileIO                        0.6.1

EDIT: I’ve just tested this method and I get:

julia> @time save("test.csv", df)
  3.201264 seconds (28.33 M allocations: 645.473 MiB, 3.88% gc time)

on some example df I had with approx. 500,000 rows and 25 columns, so it should be fast enough also for your user case.

1 Like

Thanks. Yes, I am looking to write a JuliaDB table to CSV.

For this example to work, I also had to install CSVFiles v0.4.0.

julia> using JuliaDB,IterableTables,FileIO
WARNING: JuliaDB.DTable is deprecated, use JuliaDB.DNDSparse{K, V} where V where K instead.
  likely near no file:3
WARNING: JuliaDB.DTable is deprecated, use JuliaDB.DNDSparse{K, V} where V where K instead.
  likely near no file:6
WARNING: JuliaDB.DTable is deprecated, use JuliaDB.DNDSparse{K, V} where V where K instead.
  likely near no file:7
WARNING: JuliaDB.DTable is deprecated, use JuliaDB.DNDSparse{K, V} where V where K instead.
  likely near no file:10

julia> t = table(rand(10),rand(10),names= [:x,:y]);

julia> save("test.csv",t)
WARNING: both FileIO and JuliaDB export "save"; uses of it in module Main must be qualified
ERROR: UndefVarError: save not defined

julia> FileIO.save("test.csv",t)
Error encountered while saving "test.csv".
Fatal error:
Library "CSVFiles" is not installed but is recommended as a library to load format: ".csv"
Should we install "CSVFiles" for you? (y/n):


INFO: Start installing CSVFiles...
INFO: Cloning cache of CSVFiles from https://github.com/davidanthoff/CSVFiles.jl.git
INFO: Installing CSVFiles v0.4.0
INFO: Building MbedTLS
Using system libraries...
INFO: Package database updated


julia> FileIO.save("test.csv",t)
INFO: Recompiling stale cache file C:\Users\sdmca\.julia\lib\v0.6\HTTP.ji for module HTTP.

You’re right, I forgot to mention that FileIO is some sort of “metapackage” which will require specific packages to handle specific formats (CSVFiles for CSV, FeatherFiles for Feather, ExcelFiles for Excel and so on). The advantage is that the interface is simple and always the same and you choose the format with the extension of the filename.

This also works:

open("test.csv","w") do fid
    println(fid,join(colnames(t),','))
    for i in collect(t)
        println(fid,join(i,','))
    end
end

julia> t = table(rand(100000),rand(100000),names= [:x,:y],chunks=1);

julia> @time open("text.csv","w") do fid
       println(fid,join(colnames(t),','))
       for i in collect(t)
       println(fid,join(i,','))
       end
       end
  0.198821 seconds (1.60 M allocations: 58.118 MiB, 10.27% gc time)
1 Like

This is a bit of a threadrez - but is there more updated way of doing this?

using FileIO, IterableTables, CSVFiles
FileIO.save("test1.csv", grouped)
>Error encountered while saving "test1.csv".
>Fatal error:

I think CSVFiles is not needed. But this should still work, why are you asking?

Trying that results in Can't write this data to a CSV file.

I found a solution though:

collected = collect(dindexedtable) #needed if it's distributed
df = DataFrame(values(collected))
names!(df, [:col1name, :col2name])
CSV.write("data/savedfile.csv", df)

Please, open an issue on GitHub and describe the data set that cannot be saved to file. Ideally, provide a self-contained minimal working example of the error!

1 Like