Hi,
When I switched to CSV instead of readtable() to read tabular data, I was surprised to see an extreme slowdown. I attached a small benchmark which seems to indicate that the slowdown is in the append!() function.
Thank you for any advice !
using DataFrames
using CSV
function newDF(nrow::Int64)
x = CSV.read("test.tsv"; delim="\t", header=true)
y = similar(x,0)
for i in 1:nrow
append!(y,x[i,:])
end
end
function oldDF(nrow::Int64)
x = readtable("test.tsv", separator='\t', header=true)
y = similar(x,0)
for i in 1:nrow
append!(y,x[i,:])
end
end
@time oldDF(3000)
@time newDF(3000)
@time oldDF(3000)
@time newDF(3000)
#=
Results :
readtable()
WARNING: readtable is deprecated, use CSV.read from the CSV package instead
4.402112 seconds (3.77 M allocations: 257.848 MiB, 4.21% gc time)
CSV
240.959365 seconds (518.44 M allocations: 41.142 GiB, 4.02% gc time)
readtable()
WARNING: readtable is deprecated, use CSV.read from the CSV package instead
0.947368 seconds (1.86 M allocations: 157.937 MiB, 15.46% gc time)
CSV
231.824696 seconds (513.78 M allocations: 40.898 GiB, 4.00% gc time)
=#