I cannot use attribute “resusebuffer” in latest version of CSV.jl package. How can I calculate exact number of lines in CSV?
n = 0
for row in CSV.Rows(file; resusebuffer=true)
n += 1
I tried to use:
but it gives me higher numbers. File with 5 lines has result 7 and file with 52 lines has result 57.
Did you mean to write
reusebuffer instead of
resusebuffer (note the extra
s in “resuse”)?
Other than that, you may need to set the
header keyword, depending on whether or not your dataset has a header with column names or not (there may also be other metadata in front of the header, depending on the CSV). Other than that, it works for me:
julia> function countcsvlines(file)
n = 0
for row in CSV.Rows(file; header=false, reusebuffer=true)
n += 1
countcsvlines (generic function with 1 method)
Thank you, I meant
reusebuffer. My fault.
Just in case, there is a built-in function:
file = raw"C:\...\input.csv"
Thank you @rafael.guerra . Is it possible to do something like following code?
csvRows = CSV.Rows("example.csv")
That won’t work if the CSV contains a quoted field with a newline character - CSV.jl accounts for that.
CSV.Rows is a lazy iterator
CSV.Rows: an alternative approach for consuming delimited data, where the input is only consumed one row at a time
As such, you either have to iterate manually or
collect it to figure out how many rows there are.
You can’t get around that limitation simply because CSV is much more complicated than you may initially assume - it’s not standardized, different escaping mechanisms exist and a newline character may not always indicate a new row.
Thank you for an explanation.
I think this will give incorrect results because csvs can have escaped new line characters.
@Oscar_Smith and @Sukera, thanks for the insights.
Is this a common/possible situation when working with CSVs containing only numeric data?
This only occurs for CSVs with strings in them.
It looks unfortunate you didn’t get an error for a misspelled keyword argument though
To better understand this tested a simple familiar Excel scenario with ALT+ENTER to split string labels across two lines in a single cell.
The OP’s function does provide the correct number of CSV rows =
3, while physical CSV file on disk has
The Base function
count() may do what you ask for here:
csvRows = CSV.Rows(file; header=false)
julia> count(i -> i==i, csvRows)
There should be a better way of writing this part :
i -> i==i
_ -> true, and another alternative
Returns(true) will be available when Julia 1.7 gets released.