I am excited to announce that DLMReader version 0.4.5 has been released. The new release includes several performance enhancements and bug fixes. For instance, reading multiple observations per line and type detecting are allocating less and performing better.
However, the biggest enhancement of the 0.4.5 release is the significant reduction in the time to the first read. I have managed to reduce the package latency in Mac and Linux more than 4 times in the case of reading small files with homogeneous columns - we are using Parsers.jl
for parsing floats in Windows (see issue #5) and this causes a little lag compared to other OSs, but the Windows users will still notice significant reduction in the time to the first read.
To achieve this I have implemented a new algorithm for parsing small files. The new algorithm consumes more memory compared to the current high performance algorithm, however, since it is only used for small files, the extra memory usage is unobtrusive. By default, the filereader
function calls the new algorithm for files less than 64MiB, and automatically switches to the high performance algorithm for larger files.
The following code demonstrates the latency of the new release in a fresh Julia session (Mac OS - Julia 1.7.3) - note that more than 20% of the reported time is for compiling/inferring non-DLMReader functions, thus, in real life scenarios users would expect even less latency.
julia> jcmd = Base.julia_cmd();
julia> cmd = """
start_time = time()
using DLMReader
filereader(IOBuffer("x1,x2\n1,2\n"))
println(time() - start_time)
""";
julia> run(`$jcmd -e $cmd`); # 2.135 sec(homogeneous types)
julia> cmd = """
start_time = time()
using DLMReader
filereader(IOBuffer("x1,x2\n1,2.0\n"))
println(time() - start_time)
""";
julia> run(`$jcmd -e $cmd`); # 2.644 sec(heterogeneous types)
julia> cmd = """
using InMemoryDatasets
ds = Dataset(rand(10,2), :auto)
start_time = time()
using DLMReader
filereader(IOBuffer("x1,x2\n1,2\n"))
println(time() - start_time)
""";
julia> run(`$jcmd -e $cmd`); # 1.699 sec(homogeneous types/real life scenarios)