uCSV.jl is a µ-sized package for working with delimited text. The default behavior is similar to
readcsv from base, but it is extensible enough to cover just about everything you’d expect from more established parsers in other languages as well. It supports Julia 0.6 (current) and the only dependency is Nulls.jl. The package can be found here and the documentation here.
I wrote uCSV.jl because I routinely hit the limitations of existing parsers and didn’t have the time or expertise to understand the code-bases well enough to extend them. I wanted something that was smaller, stuck to using functions from base Julia for robustness, and when it didn’t work, gave back detailed error messages explaining how to fix any problems. I think it will appeal to anyone who has trouble with existing packages and/or those using the
writetable functions in the latest release of DataFrames (which is scheduled for deprecation and removal in the near future).
It does not support other missing data formats (DataArrays & NA, NullableArrays and Nullables, or DataValueArrays and DataValues) out-of-the-box, but if anyone would like to use it with those formats and has trouble doing so, please file an issue and I’d be happy to help. Additionally, If anyone has any general parsing problems, questions, or suggestions for improvement, please open an issue. The package is currently at 100% code coverage and is tested against a diverse set of >75 delimited-text files (the most complete testing suite I’ve found for any parser, regardless of language), hand-curated for ugliness. I plan to extend the tests to cover a curated list of additional ugly datasets from RDatasets in the coming days/weeks to ensure I haven’t failed to account for anything. Everything that it can’t handle (that I’m aware of) is documented in the manual, along with suggested resolutions.
To try and place uCSV.jl in context, existing CSV parsing packages include CSV.jl and TextParse.jl, both of which are actively developed and very capable. In the medium-long term future (think Julia 1.0 release timeline) I aim to explore how well uCSV.jl can complement these tools, rather than compete with them. More specifically, I see the primary strength of CSV.jl as its tight connection with the DataStreams.jl ecosystem for streaming and converting table-like data between formats, and I see the primary strength of TextParse.jl as its very memory efficient generated parsers. I hope uCSV.jl can connect these two frameworks (DataStreams and TextParse’s generated parsers) to make both of them more powerful and accessible for the community, rather than compete with them through the CSV parsing APIs they both currently offer. If anyone has any pointers, advice, or interest in helping with this, feel free to open issues and PRs!
I’d also like to give a shoutout to the JuliaData team for their mentorship over the past several months. Without it, this package wouldn’t exist. This package is also indebted to the user communities of JuliaStats and JuliaData, as the issues everyone has opened regarding other parsers served as the initial testing suite used to build this package from the ground up.