First try seems a bit sluggish

FaradayEnt · February 21, 2021, 8:55pm

I’m an old ‘C’ programmer, but total NB here, so guess I’m doing something wrong. Would like to write an application requiring CSV file I/O so found the following sample code. I’m on Win 7, using 64 bit julia-1.5.3 downloaded as a zip with environment path set to its bin. I invoked julia from a command shell then entered:

using Pkg
Pkg.add(“CSV”)
Pkg.add(“DataFormat”)
using CSV
using DataFrames
write(“test.csv”,
“”"
a,b,c
1,2,3
4,5,6
“”")
18 <== returned by system (fairly quickly)

then I type in:
CSV.read(“test.csv”, DataFrame)

And AFTER ABOUT 20 SECONDS the following is returned:
2x3 DataFrame
Row | a b c
1 1 2 3
2 4 5 6

If I put the above in a test.jl file and invoke:

julia test.jl

It does nothing for a few seconds and just returns to the prompt.

pdeffebach · February 21, 2021, 9:06pm

It has to pre-compile everything. I would recommned a workflow where you use include("test.jl") at the REPL in an open julia session rather than run julia test.jl over and over again.

stillyslalom · February 21, 2021, 9:17pm

Yes, the first run of any big package will be slow due to compilation overhead, but subsequent runs (within the same Julia session) are fast.

julia> @time begin
           using CSV, DataFrames
           CSV.read("test.csv", DataFrame)
       end
 15.013381 seconds (26.49 M allocations: 1.370 GiB, 6.63% gc time)
2×3 DataFrame
 Row │ a      b      c
     │ Int64  Int64  Int64
─────┼─────────────────────
   1 │     1      3      5
   2 │     2      4      6

julia> @time begin
           using CSV, DataFrames
           CSV.read("test.csv", DataFrame)
       end
  0.002253 seconds (645 allocations: 48.016 KiB)
2×3 DataFrame
 Row │ a      b      c
     │ Int64  Int64  Int64
─────┼─────────────────────
   1 │     1      3      5
   2 │     2      4      6

julia> using BenchmarkTools

julia> @btime CSV.read("test.csv", DataFrame)
  211.800 μs (151 allocations: 15.64 KiB)
2×3 DataFrame
 Row │ a      b      c
     │ Int64  Int64  Int64
─────┼─────────────────────
   1 │     1      3      5
   2 │     2      4      6

If you care more about fast startup and don’t need the feature-completeness of CSV.jl/DataFrames.jl, you may want to use the built-in DelimitedFiles.jl library:

julia> @time begin
           using DelimitedFiles
           readdlm("test.csv", ',', header=true)
       end
  0.024962 seconds (26.71 k allocations: 1.696 MiB, 86.93% compilation time)
([1.0 3.0 5.0; 2.0 4.0 6.0], AbstractString["a" "b" "c"])

julia> @btime readdlm("test.csv", ',', header=true)
  123.900 μs (39 allocations: 41.92 KiB)
([1.0 3.0 5.0; 2.0 4.0 6.0], AbstractString["a" "b" "c"])

FaradayEnt · February 21, 2021, 9:33pm

Thanks for the tips so far. Yes, it really speeds up the second time.

My goal is to use julia as a scripting engine inside a (gasp) wxWidgets C++ application, basically inhaling tons of financial CSV files and processing them.

Can julia be setup so that some of these long initialization (pre-compile) things are only done once, and not every time my application is booted? Even better, can things like CSV and DataFrames are precompiled and packaged with the rest of my application so the end user never has to experience these delays?

No offense intended, and I don’t know how large these two packages are, but it SEEMS like I can compile a fairly large GCC C++ project in much less time. I do like julia though…

oxinabox · February 21, 2021, 9:37pm

Yes,
this is what PackageCompiler does.
You can build them into a compiled system image.
(It’s also why the standard library is not slow to load)
It used to be a bit scary but I hear it is pretty straight forward now.
(I still haven’t gotten round to doing it myself.)

You could also use it’s app making thing which sounds ideal for your usecase but I have even less experience with it

lmiq · February 21, 2021, 11:58pm

Depending on the exact application and setup (how do you intend to call Julia from your program), this may be interesting:

Topic		Replies	Views
CSV.jl's CSV write seems slow Performance	32	5738	January 28, 2020
Rough start with julia (with CSV package) New to Julia	18	3848	February 16, 2017
CSV read in is too slow than other language General Usage performance	13	1357	June 21, 2023
Extremely slow CSV / IO? New to Julia csv , io	3	525	January 5, 2022
Simple benchmark for CVS.jl v0.9.11 in Julia 1.6.4 Performance	1	543	November 26, 2021

First try seems a bit sluggish

Related topics