earllier i have used CSV.read function very frequnetly. but now it is taking very long time to read CSV file, and finally not giving any result even after 10 minutes .
This is the code im using and , which was uccesfully read in previous instances.
at the same time load option in CSVFiles is very quick for even big data.
my only doubt is, i used to read the same dataframe using CSV.read , but suddenly it started to take very long time and not giving any result at the end.
Just to be clear skipto=n skips the first n rows, while limit=n will only read the first n rows. So if you only want to read 2 rows, you do limit=2 or skipto=x-2 where x is the total number of rows in your file.
That said 11,000 rows isn’t very long and shouldn’t take more than a second or two, depending on how many columns you have. You still haven’t told us the size of the file.
That size should take a fraction of a second to read. Can you share the file? Is this a problem for all files you are reading in or just a specific file?
It’s very hard to check what’s going on if you can’t share the csv file. What if you do:
julia> using CSV, DataFrames
julia> CSV.write("test.csv", DataFrame(rand(1_000_000, 10), :auto));
julia> filesize("test.csv")/1e6 # This is about a 200MB csv
192.697339
julia> @time CSV.read("test.csv", DataFrame);
10.003148 seconds (42.79 M allocations: 1.836 GiB, 3.27% gc time, 85.23% compilation time)
julia> @time CSV.read("test.csv", DataFrame);
1.297614 seconds (40.00 M allocations: 1.723 GiB, 15.92% gc time)
First call is to get a sense of the compilation overhead, second call is the “typical” time after compilation. So reading in a 200MB csv takes about a second on my machine. This is with a single thread, when adding threads I get (in a new session, second call to CSV.read):
I don’t know, seems pretty self-explanatory to me. “Skip to the nth line” isn’t any less clear than “start at the nth line”, and can’t be confused with Startat, my local tattoo parlor.
I think the only confusion was my choice of words - indeed skipto skips to the n-th row, rather than skipping the first n rows (which would imply skipping to row n+1).