I’m going to announce a package soon based on these ideas. But for a preview, see
https://github.com/dgleich/NumbersFromText.jl
Here is some sample code.
using NumbersFromText
M = readmatrix("myfile.txt") # reads a matrix of data
M = readmatrix(Int, "myfile.txt") # reads a matrix of data
m = readarray("myfile.txt") # just reads a list of Float64s from myfile.txt
m = readarray(Int, "myfile.txt") # just reads a list of Ints from myfile.txt
m = readarray!("myfile.txt", rand(Int, 5)) # read Ints into an existing array
aint, afloat = readarrays("myfile.txt", Int, Float64) # reads alternating Ints and Floats
aint, afloat = readarrays!("myfile.txt", rand(Int,5), rand(Float64,5)) # read into existing arrays
Everything works with IO streams as well.
In my in-memory processing tests, this is about 2x CSV.jl (which is the fastest I’ve seen otherwise.)
I get about 32 million integers is about 2.7-2.9 seconds on my cmputer (so about 10M integers/sec.) Note that reading from disk is still not the limit as this data is about 700MB, so we need about 200MB/sec, which isn’t hard from a SSD. (These are done quickly, so I apologize if I made a mistake.)
I’m still hunting for bugs, so be warned.