I’m pleased to announce a new release of the CSV.jl package.
This release provides notable improvements in several areas, including performance, additional features, and enhanced flexibility.
- “Perfect” column typing: gone are the days of
rows_for_type_detectand parsing getting messed up after 10K rows. CSV.jl now gets column types right every time, and without needing to restart parsing.
- Auto delimiter detection: don’t worry about keeping track of which file has which delimiter; CSV.jl will figure it out for you!
- Better automatic handling of invalid files: invalid values? wrong number of values on a row? CSV.jl will handle such files gracefully, printing helpful messages about anything unexpected it runs into
- Improved performance: great care has been taken to improve performance on several levels; underlying type parsers (provided by Parsers.jl), better data locality and cache friendliness, and greater use of custom Julia structures for efficiency
- Enhanced APIs for the
CSV.Filetype: in addition to allowing iteration over rows directly, it now provides
getpropertyto access efficient read-only columns of the underlying data. If mutable columns are needed, you can
- And lots and lots of examples in documentation!
These improvements are in addition to many smaller bugfixes and quality of life enhancements. Great effort has been taken to ensure CSV.jl provides a rich set of features, comparable or better than other world-class csv parsers. (see the feature comparison table below!)
As always, please open issues as you run into bugs or performance issues and we’ll try to address things as quickly as possible. Cheers!
|Custom open/close quote characters|
|Skip/offset rows to parse|
|Limit rows to parse|
|Manually provide column names|
|Multiple rows as column names|
|Perfect type inference w/o restarting|
|Manually specify column types|
|Specify arbitrary missing strings|
|“Normalize” column names|
|Skip rows at end of file||(python engine only)|
|Ignore commented rows|
|Handle rows w/ too few/many columns|
|Read a file transposed|
|Custom decimal separator for floats|
|Custom Bool string values|
|“pool” string column values|
|Control over invalid values|
|Select/drop specific columns|
|Apply transform functions|
|Iteration over rows|
|Able to parse Date/DateTime values|
|Support reading any IO object|
|Progress meter while parsing|
|Non-UTF-8 encoded files|