Did you mean to write reusebuffer instead of resusebuffer (note the extra s in “resuse”)?
Other than that, you may need to set the header keyword, depending on whether or not your dataset has a header with column names or not (there may also be other metadata in front of the header, depending on the CSV). Other than that, it works for me:
julia> function countcsvlines(file)
n = 0
for row in CSV.Rows(file; header=false, reusebuffer=true)
n += 1
end
return n
end
countcsvlines (generic function with 1 method)
julia> countcsvlines("example.csv")
8
julia> readlines("example.csv")
8-element Vector{String}:
"1,2,3,4"
"4,5,6,7"
"1,2,3,4"
"4,5,6,7"
"1,2,3,4"
"4,5,6,7"
"1,2,3,4"
"4,5,6,7"
That won’t work if the CSV contains a quoted field with a newline character - CSV.jl accounts for that.
CSV.Rows is a lazy iterator
CSV.Rows: an alternative approach for consuming delimited data, where the input is only consumed one row at a time
As such, you either have to iterate manually or collect it to figure out how many rows there are.
You can’t get around that limitation simply because CSV is much more complicated than you may initially assume - it’s not standardized, different escaping mechanisms exist and a newline character may not always indicate a new row.
Thanks Oscar.
To better understand this tested a simple familiar Excel scenario with ALT+ENTER to split string labels across two lines in a single cell.
The OP’s function does provide the correct number of CSV rows = 3, while physical CSV file on disk has 5 lines: