Hello all,
We’re pleased to present the 0.9 release of CSV.jl.
Highlights include:
- Big internals refactoring to simplify code, make maintenance/future feature work easier, and make current features a little more robust in terms of multithreading/performance
- Support for two new custom string types builtin to parsing: InlineStrings and PosLenStrings. InlineStrings are a fixed-width string type defined as a
primitive type
which allows for inline storage inVector
s and various processing efficiencies from that representation (but which make end up taking up more space when a column has higher variance in the length of strings). PosLenStrings are a new “lazy string” representation where a reference to the original csv input is kept and a PosLenString just points to the range of bytes in the source. These will allow more flexibility depending on use-case/workload to avoid excessive allocations that can result when using regularString
s on very large files - Cleanup, review, and simplification of keyword arguments supported by
CSV.File
/CSV.Rows
-
Automatic support for reading gzipped inputs, with the ability for decompression to be done to a temporary file (the default) or in memory (by passing
buffer_in_memory=true
), as well as support for writing gzip compressed outputs inCSV.write
by passingcompress=true
- Big overhaul of the CSV.jl docs, including a massively updated “intro” section that walks through all the various APIs provided by the package (turns out people have a hard time knowing about really useful things when they aren’t well documented!); big thanks to all those who contributed feedback (and code) to help improve overall documentation
- New functionality to pass a
Vector
of inputs (be they file names, IO objects, byte vectors, etc.) toCSV.File
and they’ll all be vertically concatenated and returned as a singleCSV.File
object (currently has strict requirements on matching schemas from all inputs, but this will be relaxed in the future)
This shouldn’t be a breaking release at all, but several deprecations were introduced in the keyword argument cleanup. There were also some additional restrictions/type constraints placed on various arguments, so if you do notice things not working that worked before, please open an issue and we can figure out how to fix/support.
Why isn’t this the 1.0 release??
Good question: we’re really close. We wanted to provide one more release with the number of deprecations to allow a transition period. With the big internals refactoring, new string support, and overall amount of work that went into this release, we also just want to let bugs shake out a bit, iron out the wrinkles, and then put out a more polished 1.0 release in the very near future. It’s currently only being planned as a breaking release by removing the keyword argument deprecations, baring any unexpected issues.
Thanks again to all those willing to provide feedback, file issues, or just express appreciation at the repo or #data slack channel. We look forward to hearing from you!
-Jacob Quinn & JuliaData maintainers