Startup Speed

Hello everybody! I’ve been learning Julia and I’ve been loving it!!

I’m curious about startup times…

I have this super simple code, but I’m baffled at how it take about 13 seconds to start! Also tested it on an i9 since 13s is on my apple m1, which seems like Julia is still not stable, and it took 18s there.

All the code is doing is reading a CSV file and nothing else. After googling the issue, it seems like it’s a known issue due to Julia compiling at startup.

I was wondering how can that be helped? Some places mentioned a --precompile flag that seems to be deprecated, other talk about sysimages…

Is there a Julia way to deal with this issue? I’ve also heard about a long running julia process and then using it interactively from there, but then how do I run a file after making code changes? would I need to worry about variables declared before? is there a julia client that when run it just send command to the long running julia process?

For reference, this is the code I’m running:

@time using CSV
@time using DataFrames
@time using Dates

dateformat = "yyyy-mm-dd HH:MM:SS"
types = [DateTime, Float64, Float64, Float64, Float64, Int64]

@time data = CSV.read("data.txt", DataFrame; dateformat = dateformat, types = types)

and I’m running it by calling

julia code.jl

and these are the times that I’m getting

1.911461 seconds (5.97 M allocations: 363.832 MiB, 5.30% gc time, 88.35% compilation time)
0.745739 seconds (1.82 M allocations: 124.596 MiB, 4.18% gc time)
0.001233 seconds (293 allocations: 28.188 KiB)
10.758393 seconds (39.59 M allocations: 1.697 GiB, 4.92% gc time, 99.87% compilation time)

Thanks a lot for your help!

As far as I understand DataFrame.jl and CSV.jl are heavily optimized for the benchmark. This could (still?) have some repercussions if you try to work with small datasets…

Some tips here: Development workflow · JuliaNotes.jl

And you may be interested in GitHub - dmolina/DaemonMode.jl: Client-Daemon workflow to run faster scripts in Julia

But basically, put the code inside functions, keep the section alive, and use Revise.

2 Likes

Hi @Imiq,

good tips as usual, but do you think this is it? As far as I remember @Raf did some remarkable work on startup time here and I’m not sure if it is in production already.

The Parsers.jl fix that sped up CSV.jl was merged, but DateTime precompilation was rolled back due to some bugs on windows during precompilation, that no one really understands.

That might be what you are hitting here? You can try pinning Parsers.jl to v2.2.2 to see if it was any faster before.

CSV read will still take some time the first time, but should be faster than 10s on any newish machine.

1 Like

I recommend using either the REPL or Jupyter to keep the compiled libraries in memory, so new run are instantaneous. And even have some script that you include the first time you open the REPL (or a Jupyter notebook) so everything you need gets precompiled.

1 Like

Completely unrelated to the time question

If you have variables the same name as keyword arguments, you can save some typing by doing this:

data = CSV.read("data.txt", DataFrame; dateformat, types)

and

dateformat strings have a dedicated macro which prevents repeated conversion from string to ::DateFormat

dateformat = dateformat"yyyy-mm-dd HH:MM:SS"

this reduces running time in some situations, although I doubt that is the case here & I haven’t checked.

see:
https://docs.julialang.org/en/v1/stdlib/Dates/#Dates.DateFormat