.csv number of rows

austinbean · August 12, 2019, 11:18pm

Is there a clever way to determine the number of rows in large .csv or .tsv file without reading a whole column?

I’m currently doing:

CSV.File("big_file.csv") |> Tables.select(:first_column_name) |> DataFrame

which tells me what I need to know, but if there’s something which just counts the rows a little faster, I would love to know.

jling · August 12, 2019, 11:34pm

eh… you can just find out the number of lines in the file instead…?

julia> (open("./Final.csv") |> readlines |> length) -1
1936

aplavin · August 13, 2019, 12:32am

Be careful with this approach, it will give wrong answer for CSV files having linebreaks in some values.

austinbean · August 13, 2019, 1:14am

thanks I figured it was easy but basic googling did not reveal it.

xiaodai · August 13, 2019, 3:16am

I always think that wc -l is much faster but of course windows doesn’t come with such things pre-built, which is a shame.

quinnj · August 13, 2019, 4:30pm

As @aplavin mentioned, just doing readlines can be incorrect for csv files w/ quoted newline characters. Using the readlines function is also pretty wasteful and will gobble up a lot of memory for really large files. In Base, the countlines function will be much more efficient.

For a more general purpose solution for csv files that may contain quoted newline characters, this should be extremely fast/efficient:

function countcsvlines(file)
    n = 0
    for row in CSV.Rows(file; resusebuffer=true)
        n += 1
    end
    return n
end

mrip · September 13, 2022, 9:41pm

UPDATE: @quinnj spells it “reSusebuffer” = true)
I have found that “reusebuffer” = true) works much better

But other than that…
Amazing. This is gonna help so much with pre-allocating. Loading this and a first row reader into every CSV analysis from now on!!!

Topic		Replies	Views
CSV.jl number of lines General Usage csv , io	13	1146	November 3, 2021
Handle large csv file using `enumerate(CSV.File())` or `CSV.read()`? New to Julia	3	551	April 21, 2019
Is this an efficient way to read a .csv file row by row? General Usage	9	3066	January 27, 2019
Inconsistencies in the number of lines in a CSV file General Usage csv	3	495	November 23, 2023
CSV.Row very slow for reading files line by line Performance package , csv	0	282	May 9, 2023

.csv number of rows

Related topics