Thanks. Now I am trying to get something like this working:
function read_into_db(csv_path, db, tablename, CSV_args)
tmp = tempname()
println(" into $(tmp) (may take a few minutes) ...")
open(io->run(pipeline(pipeline(`zcat $(csv_path)`, `tr , .`), stdout=io)), tmp, "w")
println(" filesize is $(round(filesize(tmp)/(2^30),2)) Gb")
println(" ... forming CSV.Source")
src = CSV.Source(tmp; CSV_args...)
println(" ... loading into database")
SQLite.load(db, tablename, src)
println(" ... removing temporary file")
The fix from @joshbode should help with uncompression and replacement. However,
CSV.Source seems to insist on traversing the whole file first, so it is essentially a two-pass operation. Is there anything I can do about that, to make in single-pass?
In general, what is the best strategy for dealing with data that does not fit into memory in the current Julia ecosystem? If I read into an
SQLite.jl database, can I dump the data somehow columnwise using
Feather.jl (columns fit in memory)?