CSV.jl number of lines

Michal · November 2, 2021, 7:55am

I cannot use attribute “resusebuffer” in latest version of CSV.jl package. How can I calculate exact number of lines in CSV?

function countcsvlines(file)
    n = 0
    for row in CSV.Rows(file; resusebuffer=true)
        n += 1
    end
    return n
end

I tried to use:

CSV.Rows(fl.buf).ctx.rowsguess

but it gives me higher numbers. File with 5 lines has result 7 and file with 52 lines has result 57.

Sukera · November 2, 2021, 8:34am

Did you mean to write reusebuffer instead of resusebuffer (note the extra s in “resuse”)?

Other than that, you may need to set the header keyword, depending on whether or not your dataset has a header with column names or not (there may also be other metadata in front of the header, depending on the CSV). Other than that, it works for me:

julia> function countcsvlines(file)
           n = 0
           for row in CSV.Rows(file; header=false, reusebuffer=true)
               n += 1
           end
           return n
       end
countcsvlines (generic function with 1 method)

julia> countcsvlines("example.csv")
8

julia> readlines("example.csv")
8-element Vector{String}:
 "1,2,3,4"
 "4,5,6,7"
 "1,2,3,4"
 "4,5,6,7"
 "1,2,3,4"
 "4,5,6,7"
 "1,2,3,4"
 "4,5,6,7"

Michal · November 2, 2021, 8:42am

Thank you, I meant reusebuffer. My fault.

rafael.guerra · November 2, 2021, 8:51am

Just in case, there is a built-in function:

file = raw"C:\...\input.csv"
countlines(file)

Michal · November 2, 2021, 8:53am

Thank you @rafael.guerra . Is it possible to do something like following code?

usng CSV
csvRows = CSV.Rows("example.csv")
countlines(csvRows.buf)

Sukera · November 2, 2021, 9:12am

That won’t work if the CSV contains a quoted field with a newline character - CSV.jl accounts for that.

CSV.Rows is a lazy iterator

CSV.Rows: an alternative approach for consuming delimited data, where the input is only consumed one row at a time

As such, you either have to iterate manually or collect it to figure out how many rows there are.

You can’t get around that limitation simply because CSV is much more complicated than you may initially assume - it’s not standardized, different escaping mechanisms exist and a newline character may not always indicate a new row.

Michal · November 2, 2021, 9:33am

Thank you for an explanation.

Oscar_Smith · November 2, 2021, 12:22pm

I think this will give incorrect results because csvs can have escaped new line characters.

rafael.guerra · November 2, 2021, 1:28pm

@Oscar_Smith and @Sukera, thanks for the insights.
Is this a common/possible situation when working with CSVs containing only numeric data?

Oscar_Smith · November 2, 2021, 1:30pm

This only occurs for CSVs with strings in them.

giordano · November 2, 2021, 1:35pm

It looks unfortunate you didn’t get an error for a misspelled keyword argument though

rafael.guerra · November 2, 2021, 2:13pm

Thanks Oscar.
To better understand this tested a simple familiar Excel scenario with ALT+ENTER to split string labels across two lines in a single cell.
The OP’s function does provide the correct number of CSV rows = 3, while physical CSV file on disk has 5 lines:

Excel:
escaped_sequences_newline_csv_Excel

Notepad++:
escaped_sequences_newline_csv_notepadplus

rafael.guerra · November 3, 2021, 9:05pm

The Base function count() may do what you ask for here:

csvRows = CSV.Rows(file; header=false)

julia> count(i -> i==i, csvRows)
3

NB:
There should be a better way of writing this part : i -> i==i

aplavin · November 3, 2021, 9:36pm

It’s _ -> true, and another alternative Returns(true) will be available when Julia 1.7 gets released.

Topic		Replies	Views
.csv number of rows Data csv	6	3306	September 13, 2022
Inconsistencies in the number of lines in a CSV file General Usage csv	3	493	November 23, 2023
Handle large csv file using `enumerate(CSV.File())` or `CSV.read()`? New to Julia	3	551	April 21, 2019
CSV.Rows number of columns General Usage csv	1	370	October 5, 2021
Is this an efficient way to read a .csv file row by row? General Usage	9	3062	January 27, 2019

CSV.jl number of lines

Related topics