Extremely slow CSV / IO?

,

The following very simple code takes about 25 seconds to run:

using CSV

#pulseit = CSV.Rows("Data_CH0@DT5790N_13963_run_192.csv")
pulseit = CSV.Rows("foo.csv")
(curr_pulse, pulse_state) = iterate(pulseit)
(next_pulse, pulse_state) = iterate(pulseit, pulse_state)

println(curr_pulse)
println(next_pulse)

I’m just running “time julia process.jl” at the command line, and the user time is 25.265s. The lengthy foo.csv file is:

$ cat foo.csv
col1,col2,col3
A,12,2.0
B,22,5.1

Am I missing something obvious?

(Incidentally, the reason for using an iterator instead of just a for loop is the fact that I’ll eventually have two separate very large files that I’ll be iterating over simultaneously, but one of them iterates much faster than the other.)

I’ve done a couple of simple tests, just to see if I can isolate what’s going on.

  • Timing a simple println(“Hello, World!”) script takes 0.1s to start and run, which certainly seems reasonable.
  • Simply adding “using CSV” as the first line of my hello world file extends the runtime to 2.6s. That seems pretty ridiculous, but it’s not 25s.
1 Like

Please read the posts in this forum. The Julia workflow is different than other languages where you save a script and run from scratch every time.

Try to start a session, and include(“file.jl”) instead to avoid recompiling the full script again and again.

Ah, I see. Thanks!

The TLDR is that whenever you do this you are recompiling a bunch of code.