Using JuliaDB to create larger than memory datasets and work with them?

Juan · December 16, 2018, 8:36pm

JuliaDB docs explain how to perform some basic operations out-of-core, with data larger than memory.
http://juliadb.org/latest/manual/out-of-core.html
But it seems it can only load large data and produce small results.

How can I get large results too (not only the input)?
I’m using Julia 1.0.2 on Windows 10.

Imagine I want to do something like this:

using DataFrames
N=3
myDT = DataFrame(group = repeat('A':'C',outer=N), x = 1:(3*N) ) # create a dataframe
myDT.y = myDT.x .* rand(3*N) # add a new column z
myDT[myDT.group .== 'A', :y] = 0 # Replace y values when group == 'A'

but with much larger N, too large to fit on memory. (Or for example create two large matrices, multiply them and save the result).

How can I do it with JuliaDB for N larger than 10^9 and save it on disk?

I’ve tried

using JuliaDB
N=10^9
table((group = repeat('A':'C',outer=N), x = 1:(3*N) ))

but it consumes all my RAM and produces the error
ERROR: OutOfMemoryError()

Noel_Araujo · October 14, 2019, 11:00pm

Folks, I now that is late, but, any news on this post ??

jpsamaroo · October 14, 2019, 11:55pm

AFAICT, this is not at all a JuliaDB issue. Try just running repeat('A':'C',outer=10^9) on a system that has only 4GB of RAM; it will likely throw the exact same error, because it allocates ~10GB of memory. That data must first be materialized before it even gets to a JuliaDB function.

Maybe the way to handle this is to make an iterator and have table support “unrolling” iterables while it writes them to disk; I’m not sure if that’s currently supported, but I doubt it’d be a hard PR.

Noel_Araujo · October 15, 2019, 12:55am

Maybe the way to handle this is to make an iterator and have table support “unrolling” iterables while it writes them to disk; I’m not sure if that’s currently supported, but I doubt it’d be a hard PR.

Exactly this point of iterates that I was wandering if someone already did a PR, or if someone have a hack to share.

Topic		Replies	Views
JuliaDB out-of-memory computations New to Julia	2	515	December 6, 2018
JuliaDB, tutorial with large datasets and other questions General Usage tutorials	0	830	January 20, 2020
Package for reading/writing ~100GB data files General Usage	10	2883	November 17, 2018
Julia run using terminal for 1GB dataset showing out of memory error General Usage question	18	5005	August 31, 2017
Ingesting data to JuliaDB without .csv files Data question	4	1287	August 30, 2018

Using JuliaDB to create larger than memory datasets and work with them?

Related topics