I’m an old ‘C’ programmer, but total NB here, so guess I’m doing something wrong. Would like to write an application requiring CSV file I/O so found the following sample code. I’m on Win 7, using 64 bit julia-1.5.3 downloaded as a zip with environment path set to its bin. I invoked julia from a command shell then entered:
using Pkg
Pkg.add(“CSV”)
Pkg.add(“DataFormat”)
using CSV
using DataFrames
write(“test.csv”,
“”"
a,b,c
1,2,3
4,5,6
“”")
18 <== returned by system (fairly quickly)
then I type in:
CSV.read(“test.csv”, DataFrame)
And AFTER ABOUT 20 SECONDS the following is returned:
2x3 DataFrame
Row | a b c
1 1 2 3
2 4 5 6
If I put the above in a test.jl file and invoke:
julia test.jl
It does nothing for a few seconds and just returns to the prompt.
It has to pre-compile everything. I would recommned a workflow where you use include("test.jl") at the REPL in an open julia session rather than run julia test.jl over and over again.
Yes, the first run of any big package will be slow due to compilation overhead, but subsequent runs (within the same Julia session) are fast.
julia> @time begin
using CSV, DataFrames
CSV.read("test.csv", DataFrame)
end
15.013381 seconds (26.49 M allocations: 1.370 GiB, 6.63% gc time)
2×3 DataFrame
Row │ a b c
│ Int64 Int64 Int64
─────┼─────────────────────
1 │ 1 3 5
2 │ 2 4 6
julia> @time begin
using CSV, DataFrames
CSV.read("test.csv", DataFrame)
end
0.002253 seconds (645 allocations: 48.016 KiB)
2×3 DataFrame
Row │ a b c
│ Int64 Int64 Int64
─────┼─────────────────────
1 │ 1 3 5
2 │ 2 4 6
julia> using BenchmarkTools
julia> @btime CSV.read("test.csv", DataFrame)
211.800 μs (151 allocations: 15.64 KiB)
2×3 DataFrame
Row │ a b c
│ Int64 Int64 Int64
─────┼─────────────────────
1 │ 1 3 5
2 │ 2 4 6
If you care more about fast startup and don’t need the feature-completeness of CSV.jl/DataFrames.jl, you may want to use the built-in DelimitedFiles.jl library:
Thanks for the tips so far. Yes, it really speeds up the second time.
My goal is to use julia as a scripting engine inside a (gasp) wxWidgets C++ application, basically inhaling tons of financial CSV files and processing them.
Can julia be setup so that some of these long initialization (pre-compile) things are only done once, and not every time my application is booted? Even better, can things like CSV and DataFrames are precompiled and packaged with the rest of my application so the end user never has to experience these delays?
No offense intended, and I don’t know how large these two packages are, but it SEEMS like I can compile a fairly large GCC C++ project in much less time. I do like julia though…
Yes,
this is what PackageCompiler does.
You can build them into a compiled system image.
(It’s also why the standard library is not slow to load)
It used to be a bit scary but I hear it is pretty straight forward now.
(I still haven’t gotten round to doing it myself.)
You could also use it’s app making thing which sounds ideal for your usecase but I have even less experience with it