ANN: Feather.jl v0.4.0 (lazy edition)

ExpandingMan · August 28, 2018, 11:23pm

Hello all, it’s finally time to announce something that we’ve been working on and using extensively for a good while now, that is the new Feather.jl. (We still have to fix our docs, see latest for the v0.4.0 docs.)

For those of you who are not aware, feather is a lightweight binary tabular format. The most exciting thing about this new release is that it now supports completely lazy loading of tables via memory mapping. So, for example

df = Feather.read("a_table_that_is_3_GB.feather")  # this happens almost instantly

# this *only* reads the first few rows, it doesn't matter how big your file is, this operation should take approximately the same amount of time
head(df)

# the lazy loading is completely seamless. if you perform an operation, *only* the data you are asking for gets touched
# the following *only* reads in data from column_A
mean(df.column_A) 

# the following *only* reads in rows 100 to 110 from column_B
mean(df.column_B[100:110])

# or *only* rows 100 through 110 of the entire dataframe
df[100:110, :]

# but you are still just working with ordinary DataFrames
typeof(df) == DataFrame

(Note that on Windows memory mapping is disabled by default, so you should do Feather.read(filename, use_mmap=true) to use memory mapping on Windows.)

One caveat is that the feather metadata does not yet support tables larger than 4 GB (ultimately these will involve multiple files). Fortunately, this should be relatively simple to implement, so if they don’t come up with a standard we’ll eventually have our own solution.

tk3369 · August 29, 2018, 5:03am

Good stuffs! Just curious why you decided to disable memory mapping on windows by default.

ExpandingMan · August 29, 2018, 1:23pm

See here.

As memory mapping is now a central feature of feather I suspect this may have to be revisited at some point.

Topic		Replies	Views
Feather.jl, and understanding when data is loaded into RAM Data	0	936	April 11, 2017
Reading large-columned data using Feather.jl is too slow Data question , package	8	758	June 28, 2020
Benchmarking ways to write/load DataFrames IndexedTables to disk Data	42	7037	October 25, 2018
Concatenate CSV data into Feather using DataStreams Data question	5	1676	February 1, 2018
[ANN] File IO for tabular data Community announcement , io	8	1449	July 8, 2017

ANN: Feather.jl v0.4.0 (lazy edition)

Related topics