Hi, I am working with large-columned data sets. One small example is the dimension of (1452, 66584) whose data size is about 2GB. When it converted to a feather format, its size was down to 778MB. The problem is Feather.jl is unexpectedly slow for first reading, taking fairly large memory allocation, so I failed several times in ACF for memory issue and unexpectedly fast after that. Here is the output I succeeded in the following local machine:
Julia> using Feather Julia> @time al=Feather.read("DO_gm_ofa_unadj_alpr_ch1.feather"); 1850.519921 seconds (17.67 G allocations: 363.156 GiB, 1.80% gc time) julia> @time al=Feather.read("DO_gm_ofa_unadj_alpr_ch1.feather"); 3.102820 seconds (17.18 M allocations: 575.066 MiB, 7.20% gc time)
Julia> versioninfo() Julia Version 1.0.5 Commit 3af96bcefc (2019-09-09 19:06 UTC) Platform Info: OS: Linux (x86_64-pc-linux-gnu) CPU: Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz WORD_SIZE: 64 LIBM: libopenlibm LLVM: libLLVM-6.0.0 (ORCJIT, haswell)
This file is not the only to work with; it is one of the files I jointly work with. Do you have any idea to read large-columned data fast?