Not able to read a csv file

earllier i have used function very frequnetly. but now it is taking very long time to read CSV file, and finally not giving any result even after 10 minutes .

This is the code im using and , which was uccesfully read in previous instances.

df1 ="path//data.csv",DataFrame,missingstring = "",header = 2)

can someone help, why it is happening? Thank you.

Which is the size of the file? Can you share it? What is your version of CSV.jl?

these are my all packages list

(@v1.7) pkg> st
      Status `~/.julia/environments/v1.7/Project.toml`
  [cbdf2221] AlgebraOfGraphics v0.6.5
  [c52e3926] Atom v0.12.36
  [336ed68f] CSV v0.10.2
  [5d742f6a] CSVFiles v1.0.1
  [13f3f980] CairoMakie v0.7.3
  [8be319e6] Chain v0.4.10
  [a93c6f00] DataFrames v1.3.2
  [1313f7d8] DataFramesMeta v0.10.0
  [28b8d3ca] GR v0.64.0
  [c91e804a] Gadfly v1.3.4
  [7073ff75] IJulia v1.23.2
  [e5e0dc1b] Juno v0.8.4
  [bd3c0b08] MissingsAsFalse v0.1.0
  [3beb2ed1] PDFmerger v0.2.0
  [69de0a69] Parsers v2.2.2
  [91a5bcdd] Plots v1.25.10
  [d330b81b] PyPlot v2.10.0
  [1277b4bf] ShiftedArrays v1.0.0
  [f3b207a7] StatsPlots v0.14.33
  [ade2ca70] Dates

data contains around 12000 rows and 18 columns.

Try using skipto to read only a couple of lines.

this is the code i have used

df1 ="path//data.csv",DataFrame,missingstring = "",header = 2,skipto = 4)

its still loading for more than 30 minutes.

at the same time load option in CSVFiles is very quick for even big data.
my only doubt is, i used to read the same dataframe using , but suddenly it started to take very long time and not giving any result at the end.

I want to use option as well!

You’ll need to skip more than 4 lines to make a difference.

Why not just limit = 2 if you only want to read a couple of lines?

1 Like

limit to 11000 also same result

Just to be clear skipto=n skips the first n rows, while limit=n will only read the first n rows. So if you only want to read 2 rows, you do limit=2 or skipto=x-2 where x is the total number of rows in your file.

That said 11,000 rows isn’t very long and shouldn’t take more than a second or two, depending on how many columns you have. You still haven’t told us the size of the file.

filesize - 903 kb

That size should take a fraction of a second to read. Can you share the file? Is this a problem for all files you are reading in or just a specific file?

HI, i tried with different CSV files. its happening with all files

Is the file located in a network drive? Even so, the timing that you reported was still unreasonably long….

It’s very hard to check what’s going on if you can’t share the csv file. What if you do:

julia> using CSV, DataFrames

julia> CSV.write("test.csv", DataFrame(rand(1_000_000, 10), :auto));

julia> filesize("test.csv")/1e6 # This is about a 200MB csv

julia> @time"test.csv", DataFrame);
 10.003148 seconds (42.79 M allocations: 1.836 GiB, 3.27% gc time, 85.23% compilation time)

julia> @time"test.csv", DataFrame);
  1.297614 seconds (40.00 M allocations: 1.723 GiB, 15.92% gc time)

First call is to get a sense of the compilation overhead, second call is the “typical” time after compilation. So reading in a 200MB csv takes about a second on my machine. This is with a single thread, when adding threads I get (in a new session, second call to

julia> Threads.nthreads()

julia> @time"test.csv", DataFrame);
  0.351343 seconds (3.62 k allocations: 81.311 MiB)
1 Like

I had an equivalent issue. After trawling discussions, I found a suggestion of setting Parsers to version 2.2.0, which fixed it for me:

add Parsers @2.2.0
pin Parsers

Thank you tried it… but no result sometimes hangs during compilation, so I switched to DelimitedFiles.

Isn’t that skipto=n means which row to start with? So the term skip is a bit irritating here, or am I wrong?

I don’t know, seems pretty self-explanatory to me. “Skip to the nth line” isn’t any less clear than “start at the nth line”, and can’t be confused with Startat, my local tattoo parlor.

1 Like

I think the only confusion was my choice of words - indeed skipto skips to the n-th row, rather than skipping the first n rows (which would imply skipping to row n+1).

1 Like