Taking TTFX seriously: Can we make common packages faster to load and use

Raf · January 27, 2022, 3:13pm

I’m talking about the reality of how it works now. By “breaks” precompilation I mean precompiling unstable functions seems to give less or no reduction in TTFX compared to precompiling stable functions.

Oscar_Smith · January 27, 2022, 3:29pm

The basic reason for this is that with type stable code, the compiler can trace all the functions your top level code depends on and precompile all of them, but if the compiler has to trace unstable functions, it loses the ability to know what methodInstances get called which means it can’t precompile the recursive dependencies.

antoine-levitt · January 27, 2022, 3:37pm

The basic reason for this is that with type stable code, the compiler can trace all the functions your top level code depends on and precompile all of them, but if the compiler has to trace unstable functions, it loses the ability to know what methodInstances get called which means it can’t precompile the recursive dependencies.

I get that it applies to precompile statements, but does it also apply to function calls?

mihalybaci · January 27, 2022, 3:46pm

My only experience in developing packages is a little toy one I created just to learn more about the process, but I have not yet found the need to develop my own “serious” package yet. So, in a sense, I feel like this thread is directly towards people like myself who will need a good deal of help to avoid common mistakes/programming patterns that increase the TTFX. Rightly or wrongly, reading the comments gives the impression that there is a very high bar to clear.

Others (here and here) have come up with good “precompilation checklists” to go through for reducing start time. But now consider a “knowledge checklist” one may need in order to implement those suggestions (and the others above):

how to interpret the built-in profiler,
how to recognize type instabilities and understand @code_warntype,
how to read a ProfileViews.jl flamegraph,
how to use Cthulhu.jl,
what causes an “invalidation”,
how to use SnoopCompile.jl,
how to “properly” use Requires.jl
what makes code precompile-able in the first place,

and there are likely others I’ve missed. Some of these are fairly basic (interpreting the profiler and @code_warntype, reading flame graphs), but others may require more understanding.

Looking at the entirety of the thread, what would help the most is an easy-to-follow tutorial, perhaps with a dummy package repo on github, where each step is shown and explained in order of “low-hanging fruit” to “the hard stuff”. Because right now, again rightly or wrongly, I look at this thread and just think “wow, that’s a lot of work”, so anything to make the process seem more accessible would be huge.

ranocha · January 27, 2022, 6:54pm

I mean the first time a function is called.

lostella · January 28, 2022, 1:33pm

Maybe this is stupid but: I noticed that @inferred was never mentioned in the thread. I know that other tools allow for more fine-grained inspection, but shouldn’t @inferred allow to catch at least a few type instabilities?

(I’m asking since I use it all the time)

mihalybaci · February 1, 2022, 7:54pm

Inspired by this post on TTFX with CSV/DataFrames, I made a quick attempt at a function to run script files repeatedly.

using Statistics

"""
    ttfx(; code="sleep(1)", N=10, args = "", preview=false)

Compute the time to first X. 

`ttfx` will run a `file` `N` times to determine the total 
startup cost of running certain packages/functions. `ttfx()`
with no arguments will simply run `sleep(1)` and may be used
to estimate the base julia runtime cost.

`code` can either be a short snippet that will run with 
the `-e` switch or a file containing the script to be run.

`args` can be used to set Julia runtime options

`preview = true` will show the final command without running it.

"""
function ttfx(; code="sleep(1)", N=10, args = "", preview=false)
    # If running a short snippet and not a file, add a -e
    if !isfile(code)
        code = "-e '$code'"
    end

    # `cmd doesn't interpolate properly or something
    # so using shell_parse`and cmd_gen is the workaround
    ex, = Base.shell_parse("julia $args $code")
    julia_cmd = Base.cmd_gen(eval(ex))

    # Return only the command that would have been run
    preview && return julia_cmd
    
    # Run the command N times
    times = Vector{Float64}(undef, N)
    for i = 1:N
        times[i] = @elapsed run(`$julia_cmd`)
    end

    return median(times), times
end

# Run the default timing with sleep
t = ttfx()

# Run `using CSV` 15 times in the current project with CSV.jl installed and 8 threads
t = ttfx(code="using CSV", N=15, args="-t 8 --project=@.")

For me ttffx() takes a median time of ~1.17 seconds, so there is a baseline julia runtime cost of 0.17 second on my machine. Then using CSV on my computer takes a median time of 3.4 seconds

Feel free to edit and expand this (perhaps into the @ctime macro that was suggested). Code suggestions welcome!

ericphanson · February 2, 2022, 1:34am

You can clean up the Cmd stuff by doing it like this:

function ttfx(; code="sleep(1)", N=10, args = String[], preview=false)
    # If running a short snippet and not a file, add a -e
    if !isfile(code)
        code = `-e $code`
    end

    julia_cmd = `julia $args $code`

    # Return only the command that would have been run
    preview && return julia_cmd
    
    # Run the command N times
    times = Vector{Float64}(undef, N)
    for i = 1:N
        times[i] = @elapsed run(julia_cmd)
    end

    return median(times), times
end

In this way, to pass args, do pass a list of strings like ttfx(; args=["-O1","--compile=min"]).

cjdoris · February 2, 2022, 9:17am

Or ttfx(args=`-O1 --compile=min`) works too.

mihalybaci · February 2, 2022, 2:50pm

Thanks! I initially tried using both the backticks `` and @cmd to convert the string to Cmd, but I was getting a strange error from Base.shell_parse, which is the reason for that note in my original function. But for some reason your new version doesn’t give me the same error…

ericphanson · February 2, 2022, 3:35pm

Strings, commands, and arrays all interpolate into commands differently, and when you tried, you probably you had something as a string that should’ve been a command or something like that. I often end up using trial-and-error although I really should just learn the rules. In the end, you should usually not need “manual quoting” like code = "-e '$code'", and definitely should not need eval or Base.shell_parse, etc.

mihalybaci · February 2, 2022, 5:12pm

Ah, that’s good to know. Trial-and-error is how I ended up with what I had initially.

Raf · February 3, 2022, 7:36pm

Heres a minor rework, with a macro:

using Statistics
function ttfx(; code="sleep(1)", N=2, args = String[], preview=false)
    jl_startup = @elapsed run(`julia $args -e ""`)
    # If running a short snippet and not a file, add a -e
    if !isfile(code)
        code = `-e $code`
    end
    julia_cmd = `julia $args $code`
    # Return only the command that would have been run
    preview && return julia_cmd
    # Run the command N times
    times = Vector{Float64}(undef, N)
    for i = 1:N
        times[i] = @elapsed run(julia_cmd)
    end
    times .-= jl_startup
    return (mean=mean(times), times=times)
end
macro ttfx(code::String, N=1)
    ttfx(; code, N)
end

@ttfx "your code here" 10

Raf · February 5, 2022, 12:10pm

As an update, TTFX for Blink.jl, Interact.jl, JSON.jl, CSV.jl and I would guess many other packages is largely dependent on the compilation time of Parsers.jl

https://github.com/JuliaData/Parsers.jl/pull/108
https://github.com/JuliaIO/JSON.jl/pull/337

2200 dependencies means it’s probably responsible for a good fraction of the compilation time in the ecosytem.

My takeaway is not to fix your a packages precompile time, but the lowest level dependency that is causing the problem. ~~But it’s not always so easy to find where that is, because its buried so deep in the stack and usually doesn’t show up in the flame graph.~~ And to take the top of the @snoopi_deep graph more seriously! I could have found this a lot more quickly in hindsight.

jules · February 5, 2022, 9:30pm

Why would something not show up in the flame graph if it takes such a big chunk of the run time?

sdewaele · February 6, 2022, 5:00am

While we should definitely continue to strive for lower TTFX, I think we should also clearly advertise the solution to use PackageCompiler to create a sysimage. It appears to me that this is a very practical and low effort method to eliminate TTFX. Of course creating a sysimage introduces a small inconvenience in the development workflow. However, I have found that in a typical project, the set of packages that I use stabilizes pretty early on, so there is no need to create a sysimage frequently. Also, importantly, it works perfectly for application deployment, where no new code is added.

Raf · February 6, 2022, 7:59am

Deeper functions dont show up because they’re taking compile time, not run time. So the flame graph will usually show functions further out that call the slow function, or nested functions that eventually call it.

Blink/Interact (partly via AssetRegustry) will show time in JSON.Parser.parse, and adding precompilation there does help some of the ttfx. But the real problem is in Parsers.jl, 5 method calls deeper. Its only used on one line to parse floats, and AssetRegistry doesn’t read or write floats. So its not exactly what you look for first, unless you know the packages already.

Raf · February 6, 2022, 8:15am

PackageCompiler still takes some knowledge and the compile time problem is most important for the experience of newcomers.

If someone who uses R tries out Julia, they won’t be using package compiler straight away. So loading a tiny CSV and plotting a column from it will take half a minute for them, and Julia won’t feel at all “fast”.

And at the other end, as a developer, package compiler isn’t so useful.

sijo · February 6, 2022, 9:01am

I don’t think custom sysimages should be recommended for general use, before integration with Pkg is improved. See the “drawbacks” mentioned in the page you linked.

With a custom sysimage, as soon as you add a package to your project, you cannot trust Pkg (or the TOML files) anymore about which package versions you are using (you cannot even trust that the versions used are compatible).

It’s OK if you are disciplined and know what you are doing, but if people start using this without second thoughts… Imagine supposedly reproducible publications where it turns out the listed package versions are wrong because of this. This will hurt not only the researchers but also Julia’s reputation.

jules · February 6, 2022, 9:33am

Ok then you mean the @profile flamegraph but not the @snoopi_deep flamegraph? I thought you were using the latter, it should show those methods I think

Topic		Replies	Views
Any way to speed up loading large precompiled packages? General Usage precompilation , ttfp , ttfx	35	3118	May 15, 2023
Startup time of 1000 packages – 53% slower in Julia 1.12 vs 1.10 Internals & Design performance , ttfx , latency	50	3128	May 4, 2025
Ways to make slow/sluggish REPL/interactive development experience faster? Performance repl , ttfp	35	5511	July 23, 2019
Improving loading time of own package General Usage	3	398	August 29, 2022
Using (Web) packages together is much slower, e.g. 0.2 sec + 0.7 sec = 7.8 sec General Usage	4	629	December 6, 2021

Taking TTFX seriously: Can we make common packages faster to load and use

Related topics