CSV.jl type stability

dlakelan · October 22, 2022, 10:42am

If I understand correctly rowtable is going to consume the entire file and make a vector of tuples, but this could be enormous. Better to consume one row at a time and make the special type one at a time as in my loop. It’s possible I’m misunderstanding though.

rafael.guerra · October 22, 2022, 11:46am

@dlakelan, you are right, but it is still very interesting code.

As for the CSV.Rows approach, it seems to be much more inefficient in this case than Greg Plowman’s eachline parsing (I’ve tested for 1 M rows, with 4 Ints, 3 Float64 and 1 String in each). Are you seeing the same thing?

HashBrown · October 22, 2022, 12:09pm

Thank you for all the suggestions everyone! I will play around with all of them.

HashBrown · October 22, 2022, 4:50pm

@rafael.guerra I am actually seeing the opposite. That the manual parsing is taking longer than the type annotations. In fact the manual parsing is taking longer than no type annotations.

HashBrown · October 22, 2022, 4:52pm

Hmm it actually seems the slowness is coming from the GZip package that I am using to decompress the file, rather than the actual conversion themselves. Let me try with versions of the file that are already decompressed on disk.

HashBrown · October 22, 2022, 5:02pm

Ok so now I am seeing that both the manual reading and the type annotated reading are comparable. And that without type annotations is about half as fast.

The manual reading is still slightly slower (by about 10%) once both it and the type annotated CSV.File have already been compiled (on the first run, manual reading is 50% faster because of the significantly lower compile time)

aplavin · October 22, 2022, 8:10pm

If you load all the data in memory anyway in the form of those objects, then an extra copy of the same data won’t hurt performance much.
And when performance is needed, column-oriented storage is often better. For example, with StructArrays there’s a very efficient solution CSV.File(...) |> columntable |> StructArray{MyCustomType}

Topic		Replies	Views
Type-stable reading from a file General Usage question , csv , code_warntype , type-stability	14	346	February 18, 2025
[ANN] New CSV.jl 0.5 Release Package Announcements data , csv	18	5078	October 20, 2019
Question about "type-stability" of Arrays in Julia 0.6 General Usage performance , type , arrays , type-stability	3	864	August 16, 2017
[ANN] CSV.jl 0.7 Release Data	38	5329	July 18, 2020
CSV.jl fails precompling, TypeError: in Type{...} expression, expected UnionAll, got Type{Parsers.Options} General Usage question , csv	19	3634	December 8, 2021

CSV.jl type stability

Related topics