If I have a table with 36 columns and 500,000 plus rows, is there an established method for saving the table as a .CSV for use in other applications?
Thanks
If I have a table with 36 columns and 500,000 plus rows, is there an established method for saving the table as a .CSV for use in other applications?
Thanks
the CSV package would do.
Thought there wasnt datastreams support yet…can you give an example?
Is this what you want?
CSV.write("summary-cluster-runs.csv",DataFrame(N=10,w=10.0);append=true)
If the question is about a JuliaDB table, I think the easiest way to save it and load it in a bunch of formats is using FileIO and IterableTables:
julia> using JuliaDB, IterableTables, FileIO
julia> t = table(rand(10), rand(10), names = [:x, :y]);
julia> save("test.csv", t)
julia> table(load("test.csv"))
Table with 10 rows, 2 columns:
x y
────────────────────
0.0819593 0.259326
0.861373 0.380602
0.633562 0.148728
0.574784 0.0540917
0.978486 0.0148579
0.257681 0.126109
0.242756 0.302516
0.533261 0.0320025
0.900969 0.870229
0.173699 0.332354
I believe it uses TextParse to actually write and read the csv which I believe is very fast.
You may need very recent versions of all of these packages though, I’m on:
- JuliaDB 0.4.2 master
- IterableTables 0.5.2+ master
- FileIO 0.6.1
EDIT: I’ve just tested this method and I get:
julia> @time save("test.csv", df)
3.201264 seconds (28.33 M allocations: 645.473 MiB, 3.88% gc time)
on some example df
I had with approx. 500,000 rows and 25 columns, so it should be fast enough also for your user case.
Thanks. Yes, I am looking to write a JuliaDB table to CSV.
For this example to work, I also had to install CSVFiles v0.4.0.
julia> using JuliaDB,IterableTables,FileIO
WARNING: JuliaDB.DTable is deprecated, use JuliaDB.DNDSparse{K, V} where V where K instead.
likely near no file:3
WARNING: JuliaDB.DTable is deprecated, use JuliaDB.DNDSparse{K, V} where V where K instead.
likely near no file:6
WARNING: JuliaDB.DTable is deprecated, use JuliaDB.DNDSparse{K, V} where V where K instead.
likely near no file:7
WARNING: JuliaDB.DTable is deprecated, use JuliaDB.DNDSparse{K, V} where V where K instead.
likely near no file:10
julia> t = table(rand(10),rand(10),names= [:x,:y]);
julia> save("test.csv",t)
WARNING: both FileIO and JuliaDB export "save"; uses of it in module Main must be qualified
ERROR: UndefVarError: save not defined
julia> FileIO.save("test.csv",t)
Error encountered while saving "test.csv".
Fatal error:
Library "CSVFiles" is not installed but is recommended as a library to load format: ".csv"
Should we install "CSVFiles" for you? (y/n):
INFO: Start installing CSVFiles...
INFO: Cloning cache of CSVFiles from https://github.com/davidanthoff/CSVFiles.jl.git
INFO: Installing CSVFiles v0.4.0
INFO: Building MbedTLS
Using system libraries...
INFO: Package database updated
julia> FileIO.save("test.csv",t)
INFO: Recompiling stale cache file C:\Users\sdmca\.julia\lib\v0.6\HTTP.ji for module HTTP.
You’re right, I forgot to mention that FileIO is some sort of “metapackage” which will require specific packages to handle specific formats (CSVFiles for CSV, FeatherFiles for Feather, ExcelFiles for Excel and so on). The advantage is that the interface is simple and always the same and you choose the format with the extension of the filename.
This also works:
open("test.csv","w") do fid
println(fid,join(colnames(t),','))
for i in collect(t)
println(fid,join(i,','))
end
end
julia> t = table(rand(100000),rand(100000),names= [:x,:y],chunks=1);
julia> @time open("text.csv","w") do fid
println(fid,join(colnames(t),','))
for i in collect(t)
println(fid,join(i,','))
end
end
0.198821 seconds (1.60 M allocations: 58.118 MiB, 10.27% gc time)
This is a bit of a threadrez - but is there more updated way of doing this?
using FileIO, IterableTables, CSVFiles
FileIO.save("test1.csv", grouped)
>Error encountered while saving "test1.csv".
>Fatal error:
I think CSVFiles
is not needed. But this should still work, why are you asking?
Trying that results in Can't write this data to a CSV file.
I found a solution though:
collected = collect(dindexedtable) #needed if it's distributed
df = DataFrame(values(collected))
names!(df, [:col1name, :col2name])
CSV.write("data/savedfile.csv", df)
Please, open an issue on GitHub and describe the data set that cannot be saved to file. Ideally, provide a self-contained minimal working example of the error!