Using JuliaDB to create larger than memory datasets and work with them?


#1

JuliaDB docs explain how to perform some basic operations out-of-core, with data larger than memory.
http://juliadb.org/latest/manual/out-of-core.html
But it seems it can only load large data and produce small results.

How can I get large results too (not only the input)?
I’m using Julia 1.0.2 on Windows 10.

Imagine I want to do something like this:

using DataFrames
N=3
myDT = DataFrame(group = repeat('A':'C',outer=N), x = 1:(3*N) ) # create a dataframe
myDT.y = myDT.x .* rand(3*N) # add a new column z
myDT[myDT.group .== 'A', :y] = 0 # Replace y values when group == 'A'

but with much larger N, too large to fit on memory. (Or for example create two large matrices, multiply them and save the result).

How can I do it with JuliaDB for N larger than 10^9 and save it on disk?

I’ve tried

using JuliaDB
N=10^9
table((group = repeat('A':'C',outer=N), x = 1:(3*N) ))

but it consumes all my RAM and produces the error
ERROR: OutOfMemoryError()