- Big internals refactoring to simplify code, make maintenance/future feature work easier, and make current features a little more robust in terms of multithreading/performance
- Support for two new custom string types builtin to parsing: InlineStrings and PosLenStrings. InlineStrings are a fixed-width string type defined as a
primitive typewhich allows for inline storage in
Vectors and various processing efficiencies from that representation (but which make end up taking up more space when a column has higher variance in the length of strings). PosLenStrings are a new “lazy string” representation where a reference to the original csv input is kept and a PosLenString just points to the range of bytes in the source. These will allow more flexibility depending on use-case/workload to avoid excessive allocations that can result when using regular
Strings on very large files
- Cleanup, review, and simplification of keyword arguments supported by
Automatic support for reading gzipped inputs, with the ability for decompression to be done to a temporary file (the default) or in memory (by passing
buffer_in_memory=true), as well as support for writing gzip compressed outputs in
- Big overhaul of the CSV.jl docs, including a massively updated “intro” section that walks through all the various APIs provided by the package (turns out people have a hard time knowing about really useful things when they aren’t well documented!); big thanks to all those who contributed feedback (and code) to help improve overall documentation
- New functionality to pass a
Vectorof inputs (be they file names, IO objects, byte vectors, etc.) to
CSV.Fileand they’ll all be vertically concatenated and returned as a single
CSV.Fileobject (currently has strict requirements on matching schemas from all inputs, but this will be relaxed in the future)
This shouldn’t be a breaking release at all, but several deprecations were introduced in the keyword argument cleanup. There were also some additional restrictions/type constraints placed on various arguments, so if you do notice things not working that worked before, please open an issue and we can figure out how to fix/support.
Good question: we’re really close. We wanted to provide one more release with the number of deprecations to allow a transition period. With the big internals refactoring, new string support, and overall amount of work that went into this release, we also just want to let bugs shake out a bit, iron out the wrinkles, and then put out a more polished 1.0 release in the very near future. It’s currently only being planned as a breaking release by removing the keyword argument deprecations, baring any unexpected issues.
-Jacob Quinn & JuliaData maintainers