Let’s say, I have some storage with a table-like data in it. For example, it is a single big binary file with serialized array of StoredRow
structures:
struct StoredRow
a::Float64
b::Float64
c::UInt16
end
Also, I have a function to read chunks of data from storage given a range of row index and convert them into array of StoredRow structures:
function readfromstorage(storage, range::UnitRange{Int64})::Array{StoredRow}
If it was a small table, I could convert it into table (DataFrame
, StructArray
, or IndexedTable
) and work with it directly:
# some small array:
vec = [StoredRow(rand(),rand(),i) for i = 1:100]
using DataFrames
df = DataFrame(vec)
using StructArrays, JuliaDB
s = StructArray(vec)
t = table(fieldarrays(s))
But how can I create a similar table object, if storage data does not fit in memory?
Can I simply attach data source (or some other abstract interface type) to a table object, so it can lazy read data from source in chunks (and maybe cache the last readed results)?