First try seems a bit sluggish

I’m an old ‘C’ programmer, but total NB here, so guess I’m doing something wrong. Would like to write an application requiring CSV file I/O so found the following sample code. I’m on Win 7, using 64 bit julia-1.5.3 downloaded as a zip with environment path set to its bin. I invoked julia from a command shell then entered:

using Pkg
Pkg.add(“CSV”)
Pkg.add(“DataFormat”)
using CSV
using DataFrames
write(“test.csv”,
“”"
a,b,c
1,2,3
4,5,6
“”")
18 <== returned by system (fairly quickly)

then I type in:
CSV.read(“test.csv”, DataFrame)

And AFTER ABOUT 20 SECONDS the following is returned:
2x3 DataFrame
Row | a b c
1 1 2 3
2 4 5 6

If I put the above in a test.jl file and invoke:

julia test.jl

It does nothing for a few seconds and just returns to the prompt.

It has to pre-compile everything. I would recommned a workflow where you use include("test.jl") at the REPL in an open julia session rather than run julia test.jl over and over again.

1 Like

Yes, the first run of any big package will be slow due to compilation overhead, but subsequent runs (within the same Julia session) are fast.

julia> @time begin
           using CSV, DataFrames
           CSV.read("test.csv", DataFrame)
       end
 15.013381 seconds (26.49 M allocations: 1.370 GiB, 6.63% gc time)
2×3 DataFrame
 Row │ a      b      c
     │ Int64  Int64  Int64
─────┼─────────────────────
   1 │     1      3      5
   2 │     2      4      6

julia> @time begin
           using CSV, DataFrames
           CSV.read("test.csv", DataFrame)
       end
  0.002253 seconds (645 allocations: 48.016 KiB)
2×3 DataFrame
 Row │ a      b      c
     │ Int64  Int64  Int64
─────┼─────────────────────
   1 │     1      3      5
   2 │     2      4      6

julia> using BenchmarkTools

julia> @btime CSV.read("test.csv", DataFrame)
  211.800 μs (151 allocations: 15.64 KiB)
2×3 DataFrame
 Row │ a      b      c
     │ Int64  Int64  Int64
─────┼─────────────────────
   1 │     1      3      5
   2 │     2      4      6

If you care more about fast startup and don’t need the feature-completeness of CSV.jl/DataFrames.jl, you may want to use the built-in DelimitedFiles.jl library:

julia> @time begin
           using DelimitedFiles
           readdlm("test.csv", ',', header=true)
       end
  0.024962 seconds (26.71 k allocations: 1.696 MiB, 86.93% compilation time)
([1.0 3.0 5.0; 2.0 4.0 6.0], AbstractString["a" "b" "c"])

julia> @btime readdlm("test.csv", ',', header=true)
  123.900 μs (39 allocations: 41.92 KiB)
([1.0 3.0 5.0; 2.0 4.0 6.0], AbstractString["a" "b" "c"])

Thanks for the tips so far. Yes, it really speeds up the second time.

My goal is to use julia as a scripting engine inside a (gasp) wxWidgets C++ application, basically inhaling tons of financial CSV files and processing them.

Can julia be setup so that some of these long initialization (pre-compile) things are only done once, and not every time my application is booted? Even better, can things like CSV and DataFrames are precompiled and packaged with the rest of my application so the end user never has to experience these delays?

No offense intended, and I don’t know how large these two packages are, but it SEEMS like I can compile a fairly large GCC C++ project in much less time. I do like julia though…

Yes,
this is what PackageCompiler does.
You can build them into a compiled system image.
(It’s also why the standard library is not slow to load)
It used to be a bit scary but I hear it is pretty straight forward now.
(I still haven’t gotten round to doing it myself.)

You could also use it’s app making thing which sounds ideal for your usecase but I have even less experience with it

1 Like

Depending on the exact application and setup (how do you intend to call Julia from your program), this may be interesting:

3 Likes