As @aplavin mentioned, just doing readlines can be incorrect for csv files w/ quoted newline characters. Using the readlines function is also pretty wasteful and will gobble up a lot of memory for really large files. In Base, the countlines function will be much more efficient.
For a more general purpose solution for csv files that may contain quoted newline characters, this should be extremely fast/efficient:
function countcsvlines(file)
n = 0
for row in CSV.Rows(file; resusebuffer=true)
n += 1
end
return n
end
UPDATE: @quinnj spells it “reSusebuffer” = true)
I have found that “reusebuffer” = true) works much better
But other than that…
Amazing. This is gonna help so much with pre-allocating. Loading this and a first row reader into every CSV analysis from now on!!!